Data Analytics Interview Questions – Set 07

What was your most difficult data analysis project?

With a question like this, the interviewer is gaining insight into how you approach and solve problems. It also provides an idea of the type of work you have already done. Be sure to explain the event, action, and result (EAR), avoid blaming others, and explain why this project was difficult:

“My most difficult project was on endangered animals. I had to predict how many animals would survive to 2020, 2050, and 2100. Before this, I’d dealt with data that was already there, with events that had already happened. So, I researched the various habitats, the animal’s predators and other factors, and did my predictions. I have high confidence in the results.”

Why do you want to be a data analyst?

For the most part, this sort of question can serve as an icebreaker. However, sometimes, even if the interviewers don’t explicitly say it, they expect you to answer a more specific question: “Why do you want to be a data analyst for us?”

With these self-reflective questions, there’s not really a right answer I can offer you. There are wrong answers, though—red flags for which the employer is searching.

Answers that show you misunderstand the role are the main “wrong” answers here. Equally, an answer that makes you sound wishy-washy about data analysis can raise red flags.

A few things you probably want to get across include:

  1. You love data.
  2. You’ve researched the company and understand why your role as a data analyst will help it succeed.
  3. You more or less understand what’s expected of your role.
  4. You’re confident in your decision.

What is the basic syntax style of writing code in SAS?

The basic syntax style of writing code in SAS is as follows:

  1. Write the DATA statement which will basically name the dataset.
  2. Write the INPUT statement to name the variables in the data set.
  3. All the statements should end with a semi-colon.
  4. There should be a proper space between word and a statement.

What is the difference between data mining and data profiling?

Data profiling is usually done to assess a dataset for its uniqueness, consistency and logic. It cannot identify incorrect or inaccurate data values.

Data mining is the process of finding relevant information which has not been found before. It is the way in which raw data is turned into valuable information.

Explain what is imputation? List out different types of imputation techniques?

During imputation we replace missing data with substituted values. The types of imputation techniques involve are

  • Single Imputation
  • Hot-deck imputation: A missing value is imputed from a randomly selected similar record by the help of punch card
  • Cold deck imputation: It works same as hot deck imputation, but it is more advanced and selects donors from another datasets
  • Mean imputation: It involves replacing missing value with the mean of that variable for all other cases
  • Regression imputation: It involves replacing missing value with the predicted values of a variable based on other variables
  • Stochastic regression: It is same as regression imputation, but it adds the average regression variance to regression imputation
  • Multiple Imputation
  • Unlike single imputation, multiple imputation estimates the values multiple times

What scripting languages have you used in your projects as a data analyst? Which one you you’d say you like best?

Most large companies work with numerous scripting languages. So, a good command of more than one is definitely a plus. Nevertheless, if you aren’t well familiar with the main language used by the company you apply at, you can still make a good impression. Demonstrate enthusiasm to expand your knowledge, and point out that your fluency in other scripting languages gives you a solid foundation for learning new ones.

Example 
“I’m most confident in using SQL, since that’s the language I’ve worked with throughout my Data Analyst experience. I also have a basic understanding of Python and have recently enrolled in a Python Programming course to sharpen my skills. So far, I’ve discovered that my expertise in SQL helps me advance in Python with ease.”

What is the Truth Table?

Truth Table is a collection of facts, determining the truth or falsity of a proposition. It works as a complete theorem-prover and is of three types –

  • Accumulative truth Table
  • Photograph truth Table
  • Truthless Fact Table

What should a data analyst do with missing or suspected data?

In such a case, a data analyst needs to:

  • Use data analysis strategies like deletion method, single imputation methods, and model-based methods to detect missing data.
  • Prepare a validation report containing all information about the suspected or missing data.
  • Scrutinize the suspicious data to assess their validity.
  • Replace all the invalid data (if any) with a proper validation code.

What is the difference between factor analysis and principal component analysis?

The aim of principal component analysis is to explain the covariance between variables while the aim of factor analysis is to explain the variance between variables.

What is a Pivot Table, and what are the different sections of a Pivot Table?

A Pivot Table is a simple feature in Microsoft Excel which allows you to quickly summarize huge datasets. It is really easy to use as it requires dragging and dropping rows/columns headers to create reports.

A Pivot table is made up of four different sections:

  • Values Area: Values are reported in this area
  • Rows Area: The headings which are present on the left of the values.
  • Column Area: The headings at the top of the values area makes the columns area.
  • Filter Area: This is an optional filter used to drill down in the data set.