Hadoop Interview Questions – Set 01

What is the difference between Input Split and HDFS Block?

The Logical division of data is called Input Split and physical division of data is called HDFS Block

Give the use of the bootstrap panel.

We use panels in bootstrap from the boxing of DOM components.

How JobTracker assign tasks to the TaskTracker?

The TaskTracker periodically sends heartbeat messages to the JobTracker to assure that it is alive. This messages also inform the JobTracker about the number of available slots. This return message updates JobTracker to know about where to schedule task.

What is heartbeat in HDFS?

Heartbeat is a signal which is used between a data node and name node, and between task tracker and job tracker. If the name node or job tracker doesn’t respond to the signal then it is considered that there is some issue with data node or task tracker.

What is the SequenceFileInputFormat in Hadoop?

In Hadoop, SequenceFileInputFormat is used to read files in sequence. It is a specific compressed binary file format which passes data between the output of one MapReduce job to the input of some other MapReduce job.

What are the network requirements for using Hadoop?

Following are the network requirement for using Hadoop:

  • Password-less SSH connection.
  • Secure Shell (SSH) for launching server processes

What is Hadoop?

Hadoop is a distributed computing platform. It is written in Java. It consists of the features like Google File System and MapReduce.

What are the functionalities of JobTracker?

These are the main tasks of JobTracker:

  • To accept jobs from the client.
  • To communicate with the NameNode to determine the location of the data.
  • To locate TaskTracker Nodes with available slots.
  • To submit the work to the chosen TaskTracker node and monitors the progress of each task.

Define TaskTracker.

TaskTracker is a node in the cluster that accepts tasks like MapReduce and Shuffle operations from a JobTracker.

What is the difference between HDFS and NAS?

HDFS data blocks are distributed across local drives of all machines in a cluster whereas, NAS data is stored on dedicated hardware.