Data Analytics Interview Questions – Set 06

What is KNN imputation method?

KNN imputation method seeks to impute the values of the missing attributes using those attribute values that are nearest to the missing attribute values. The similarity between two attribute values is determined using the distance function.

What is the difference between R-squared and adjusted R-squared?

R-squared measures the proportion of variation in the dependent variables explained by the independent variables.

Adjusted R-squared gives the percentage of variation explained by those independent variables that in reality affect the dependent variable.

Can you tell what is a waterfall chart and when do we use it?

The waterfall chart shows both positive and negative values which lead to the final result value. For example, if you are analyzing a company’s net income, then you can have all the cost values in this chart. With such kind of a chart, you can visually, see how the value from revenue to the net income is obtained when all the costs are deducted.

What is a hash table?

In computing, a hash table is a map of keys to values. It is a data structure used to implement an associative array. It uses a hash function to compute an index into an array of slots, from which desired value can be fetched.

Mention what is data cleansing?

Data cleaning also referred as data cleansing, deals with identifying and removing errors and inconsistencies from data in order to enhance the quality of data.

Mention some common problems that data analysts encounter during analysis.

  • Having a poor formatted data file. For instance, having CSV data with un-escaped newlines and commas in columns.
  • Having inconsistent and incomplete data can be frustrating.
  • Common Misspelling and Duplicate entries are a common data quality problem that most of the data analysts face.
  • Having different value representations and misclassified data.

What is the difference between heat map and tree map?

A heat map is used for comparing categories with color and size. With heat maps, you can compare two different measures together. A treemap is a powerful visualization that does the same as that of the heat map. Apart from that, it is also used for illustrating hierarchical data and part-to-whole relationships.

Explain what is Hierarchical Clustering Algorithm?

Hierarchical clustering algorithm combines and divides existing groups, creating a hierarchical structure that showcase the order in which groups are divided or merged.

Mention the steps of a Data Analysis project.

The core steps of a Data Analysis project include:

  • The foremost requirement of a Data Analysis project is an in-depth understanding of the business requirements.
  • The second step is to identify the most relevant data sources that best fit the business requirements and obtain the data from reliable and verified sources.
  • The third step involves exploring the datasets, cleaning the data, and organizing the same to gain a better understanding of the data at hand.
  • In the fourth step, Data Analysts must validate the data.
  • The fifth step involves implementing and tracking the datasets.
  • The final step is to create a list of the most probable outcomes and iterate until the desired results are accomplished.

What is an Outlier?

Another must-know term for any data analyst, the outlier (whether multivariate or univariate), refers to a distant value that deviates from a sample’s pattern.