Data Analytics Interview Questions – Set 13

As a data analyst, you’ll often work with stakeholders who lack technical background and a deeper understanding of data and databases. Have you ever been in a situation like this and how did you handle this challenge?

Data analysts often face the challenge of communicating findings to coworkers from different departments or senior management with limited understanding of data. This requires excellent skills in interpreting specific terms using non-technical language. Moreover, it also requires extra patience to listen to your coworkers’ questions and provide answers in an easy-to-digest way. Show the interviewer that you’re capable of working efficiently with people from different types of background who don’t speak your “language”.

Example
“In my work with stakeholders, it often comes down to the same challenge – facing a question I don’t have the answer to, due to limitations of the gathered data or the structure of the database. In such cases, I analyze the available data to deliver answers to the most closely related questions. Then, I give the stakeholders a basic explanation of the current data limitations and propose the development of a project that would allow us to gather the unavailable data in the future. This shows them that I care about their needs and I’m willing to go the extra mile to provide them with what they need.”

Pumpkin must be equally divided into 8 equal pieces. You can have only 3 cuts. How do you think, will you make this possible?

The approach to answering this question is simple. You just must cut the pumpkin horizontally down the center, followed by making 2 other cuts vertically intersecting each other. So, this would give you your 8 equal pieces.

What is K-mean Algorithm?

K-mean is a partitioning technique in which objects are categorized into K groups. In this algorithm, the clusters are spherical with the data points are aligned around that cluster, and the variance of the clusters is similar to one another.

What are the most popular statistical methods used when analyzing data?

The most popular statistical methods used in data analytics are –

  • Linear Regression
  • Classification
  • Resampling Methods
  • Subset Selection
  • Shrinkage
  • Dimension Reduction
  • Nonlinear Models
  • Tree-Based Methods
  • Support Vector Machines
  • Unsupervised Learning

What steps can you take to handle slow Excel workbooks?

Well, there are various ways to handle slow Excel workbooks. But, here are a few ways in which you can handle workbooks.

  • Try using manual calculation mode.
  • Maintain all the referenced data in a single sheet.
  • Often use excel tables and named ranges.
  • Use Helper columns instead of array formulas.
  • Try to avoid using entire rows or columns in references.
  • Convert all the unused formulas to values.

What is the main difference between overfitting and underfitting?

Overfitting – In overfitting, a statistical model describes any random error or noise, and occurs when a model is super complicated. An overfit model has poor predictive performance as it overreacts to minor fluctuations in training data.

Underfitting – In underfitting, a statistical model is unable to capture the underlying data trend. This type of model also shows poor predictive performance.

What is the difference between Data Mining and Data Profiling?

Data Mining: Data Mining refers to the analysis of data with respect to finding relations that have not been discovered earlier. It mainly focuses on the detection of unusual records, dependencies and cluster analysis.

Data Profiling: Data Profiling refers to the process of analyzing individual attributes of data. It mainly focuses on providing valuable information on data attributes such as data type, frequency etc.

Explain what are the tools used in Big Data?

Tools used in Big Data includes

  • Hadoop
  • Hive
  • Pig
  • Flume
  • Mahout
  • Sqoop

Explain univariate, bivariate, and multivariate analysis.

Univariate analysis refers to a descriptive statistical technique that is applied to datasets containing a single variable. The univariate analysis considers the range of values and also the central tendency of the values.

Bivariate analysis simultaneously analyzes two variables to explore the possibilities of an empirical relationship between them. It tries to determine if there is an association between the two variables and the strength of the association, or if there are any differences between the variables and what is the importance of these differences.

Multivariate analysis is an extension of bivariate analysis. Based on the principles of multivariate statistics, the multivariate analysis observes and analyzes multiple variables (two or more independent variables) simultaneously to predict the value of a dependent variable for the individual subjects.

Explain what is KPI, design of experiments and 80/20 rule?

KPI: It stands for Key Performance Indicator, it is a metric that consists of any combination of spreadsheets, reports or charts about business process

Design of experiments: It is the initial process used to split your data, sample and set up of a data for statistical analysis

80/20 rules: It means that 80 percent of your income comes from 20 percent of your clients