HBase Interview Questions & Answers

1. What is Apache HBase?


An Open Source, Hadoop project which is distributed as well as has its genesis in the Google’s Bigtable. This is what we call Apache HBase. programming language of HBase is Java. Moreover, now it is considered an integral part of the Apache Software Foundation as well as the Hadoop ecosystem.

2. What is HBaseFsck class?


There is a tool name hbck is available in HBase, which is implemented by the HBaseFsck class. Basically, it offers several command-line switches that influence its behavior.

3. What is REST?


Rest explains the semantics so that we can use the protocol in a generic way to address remote resources. Also to communicate with the server, it supports different message formats, offering many choices for a client application.

4. Define Thrift?


In C++, Apache Thrift is written, but for many programming languages, it offers schema compilers, which includes Java, C++, Perl, PHP, Python, Ruby, and more.

5. What are the fundamental key structures of HBase?


Row key and Column key are the fundamental key structures of HBase.

6. What is JMX?


To export status of Java applications, the Java Management Extensions technology is the standard for them.

5. What is Nagios?


A very commonly support tool for gaining qualitative data regarding cluster status is Nagios. On a regular basis, it polls current metrics and also compares them with given thresholds.

6. What is the syntax of describe Command?


Syntax –

hbase> describe tablename
Follow the link to learn more about HBase Commands

7. What is the use of exists command?


In order to check that the specified table exists or not, the exists command is used.

8. What is the use of MasterServer?


To assign a region to the region server as well as to handle the load balancing we use the MasterServer.

9. What is HBase Shell?


A Java API by which we communicate with HBase is what we call HBase Shell.

10. What is the use of ZooKeeper?


To maintain the configuration information and communication between region servers and clients, we use the zookeeper. Also, it offers distributed synchronization.

11. Define catalog tables in HBase?


In order to maintain the metadata information, we use Catalog tables.

12. Define cell in HBase?


The smallest unit of HBase table which stores the data in the form of a tuple is what we call Cell in HBase.

13. Define compaction in HBase?


Basically, a process which is used to merge the Hfiles into the one file and after the merging file is created and the then old file is deleted this is the process of Compaction.

14. What is the use of HColumnDescriptor class?


The information about a column family like compression settings, Number of versions etc, stores in HColumnDescriptor.

15. What is the function of HMaster?


For monitoring all Region Server instances in clusters, a MasterServer is responsible.

16. How many compaction types are in HBase?


Compaction in HBase are of two types:

Minor Compaction
Major Compaction.

17. Define HRegionServer in HBase


The server which is responsible for managing and serving regions is what we call HRegionserver.

18. Which filter accepts the page size as the parameter in HBase?


A filter named PageFilter accepts the page size as the parameter.

19. Which method is used to access HFile directly without using HBase?


In order to access HFile directly without using HBase, we use HFile.main() method.

20. Pros of HBase?


There are various advantages of HBase, like:

Large data sets:
It can easily handle as well as stores large datasets on top of HDFS file storage.
Databases breakdown:
When relational databases breakdown at that time, HBase shine.
Fast Processing:
In HBase, data reading and processing will take the less amount of time.
Failover support and load sharing:
Since HDFS is internally distributed and automatically recovered and HBase runs on top of HDFS, so HBase is automatically recovered. And with the help of RegionServer replication, we have this failover facility.
In both linear and modular form, Scalability supports.

21. Cons of HBase?


There are various disadvantages to HBase, like:

Single point of failure:
At the time when only one HMaster is used, there is a possibility of failure.
No transaction support:
In HBase, there is no support for the transaction.
No handling of JOINS in database:
Instead of the database itself, JOINs are handled in MapReduce layer.

22. Specify some uses of HBase.


Most Use cases of Apache HBase are:

To have random, real-time read/write access to Big Data, Apache HBase is great.
To host very large tables on top of clusters of commodity hardware Apache HBase is a great choice.
HBase is a non-relational database modeled.

23. State some applications of HBase.


some applications of HBase are:

For write-heavy applications, we can use Apache HBase.
Moreover, for fast random access to available data, HBase is a good choice.
And companies, like Twitter, Facebook, Yahoo, and Adobe etc. are using HBase internally.

24. How many types of HBase Operations are there?


There are two basic types of HBase Operations:

Read Operation
Write Operation

25. Explain HBase Architecture in brief?


Basically, servers in an HBase Architecture are of 3 types HMaster, Region Server, and ZooKeeper.

i. Servers which serve data for reads and write purposes is Region servers. That means while accessing data clients can directly communicate with HBase RegionServers.

ii. HBase Master handles the region assignment as well as DDL (create, delete tables) operations.

iii. And, Zookeeper maintains a live cluster state.

26. What is HBase HMaster?


Basically, for region assignment as well as DDL (create, delete tables) operations, HBase master is responsible.
Main responsibilities of a master are:

a. Coordinating the region servers

b. Admin functions

27. Explain HBase Meta Table?


A special HBase Catalog table is META table. Mainly, it holds the location of the regions in the cluster.

Also, it keeps a list of all regions in the system.
.META. table’s structure is :
Key: region start key, region id
Values: RegionServer

28. What is TTL (Time to live) in Hbase?

TTL is a data retention technique using which the version of a cell can be preserved till a specific time period.Once that timestamp is reached the specific version will be removed.

29. What is the difference between the commands delete column and delete family?

The Delete column command deletes all versions of a column but the delete family deletes all columns of a particular family.

30. What is compaction in Hbase?

As more and more data is written to Hbase, many HFiles get created. Compaction is the process of merging these HFiles to one file and after the merged file is created successfully, discard the old file.

For more  Click Here

For Course Content  Click Here