Hadoop Interview Questions – Set 02

What is the purpose of button groups?

Button groups are used for the placement of more than one buttons in the same line.

What is distributed cache in Hadoop?

Distributed cache is a facility provided by MapReduce Framework. It is provided to cache files (text, archives etc.) at the time of execution of the job. The Framework copies the necessary files to the slave node before the execution of any task at that node.

Which command is used for the retrieval of the status of daemons running the Hadoop cluster?

The ‘jps’ command is used for the retrieval of the status of daemons running the Hadoop cluster.

What is Hadoop Streaming?

Hadoop streaming is a utility which allows you to create and run map/reduce job. It is a generic API that allows programs written in any languages to be used as Hadoop mapper.

What is JobTracker in Hadoop?

JobTracker is a service within Hadoop which runs MapReduce jobs on the cluster.

How to debug Hadoop code?

There are many ways to debug Hadoop codes but the most popular methods are:

  • By using Counters.
  • By web interface provided by the Hadoop framework.

What are the most common input formats defined in Hadoop?

These are the most common input formats defined in Hadoop:

  1. TextInputFormat
  2. KeyValueInputFormat
  3. SequenceFileInputFormat
    TextInputFormat is a by default input format.

How do you categorize a big data?

The big data can be categorized using the following features:

  • Volume
  • Velocity
  • Variety

What is shuffling in MapReduce?

Shuffling is a process which is used to perform the sorting and transfer the map outputs to the reducer as input.

What commands are used to see all jobs running in the Hadoop cluster and kill a job in LINUX?

Hadoop job – list

Hadoop job – kill jobID