Hadoop Interview Questions – Set 03

What is InputSplit in Hadoop? Explain.

When a Hadoop job runs, it splits input files into chunks and assigns each split to a mapper for processing. It is called the InputSplit.

What is a combiner in Hadoop?

A Combiner is a mini-reduce process which operates only on data generated by a Mapper. When Mapper emits the data, combiner receives it as input and sends the output to a reducer.

What is WebDAV in Hadoop?

ebDAV is a set of extension to HTTP which is used to support editing and uploading files. On most operating system WebDAV shares can be mounted as filesystems, so it is possible to access HDFS as a standard filesystem by exposing HDFS over WebDAV.

Is it possible to provide multiple inputs to Hadoop? If yes, explain.

Yes, It is possible. The input format class provides methods to insert multiple directories as input to a Hadoop job.

Explain the use of .mecia class?

For the floating of media objects from one side to another, we use this class.

What is NameNode in Hadoop?

NameNode is a node, where Hadoop stores all the file location information in HDFS (Hadoop Distributed File System). We can say that NameNode is the centerpiece of an HDFS file system which is responsible for keeping the record of all the files in the file system, and tracks the file data across the cluster or multiple machines

What is the functionality of JobTracker in Hadoop? How many instances of a JobTracker run on Hadoop cluster?

JobTracker is a giant service which is used to submit and track MapReduce jobs in Hadoop. Only one JobTracker process runs on any Hadoop cluster. JobTracker runs it within its own JVM process.

Functionalities of JobTracker in Hadoop:

  • When client application submits jobs to the JobTracker, the JobTracker talks to the NameNode to find the location of the data.
  • It locates TaskTracker nodes with available slots for data.
  • It assigns the work to the chosen TaskTracker nodes.
  • The TaskTracker nodes are responsible to notify the JobTracker when a task fails and then JobTracker decides what to do then. It may resubmit the task on another node or it may mark that task to avoid.

What is TextInputFormat?

In TextInputFormat, each line in the text file is a record. Value is the content of the line while Key is the byte offset of the line. For instance, Key: longWritable, Value: text

What are the Hadoop’s three configuration files?

Following are the three configuration files in Hadoop:

  • core-site.xml
  • mapred-site.xml
  • hdfs-site.xml

What is Sqoop in Hadoop?

Sqoop is a tool used to transfer data between the Relational Database Management System (RDBMS) and Hadoop HDFS. By using Sqoop, you can transfer data from RDBMS like MySQL or Oracle into HDFS as well as exporting data from HDFS file to RDBMS.