Have you ever created or worked with statistical models? If so, please describe how you’ve used it to solve a business task.

As a data analyst, you don’t specifically need experience with statistical models, unless it’s required for the job you’re applying for. If you haven’t been involved in building, using, or maintaining statistical models, be open about it and mention any knowledge or partial experience you may have.

Example
“Being a data analyst, I can’t say I’ve had direct experience building statistical models. However, I’ve helped the statistical department by making sure they have access to the proper data and analyzing it. The model in question was built with the purpose of identifying the customers who were most inclined to buy additional products and predicting when they were most likely to make that decision. My job was to establish the appropriate variables used in the model and assess its performance once it was ready.”

What is Standard Deviation?

Standard deviation is a very popular method to measure any degree of variation in a data set. It measures the average spread of data around the mean most accurately.

Which step of a data analysis project do you enjoy the most?

It’s normal for a data analyst to have preferences of certain tasks over others. However, you’ll most probably be expected to deal with all steps of a project – from querying and cleaning, through analyzing, to communicating findings. So, make sure you don’t show antipathy to any of the above. Instead, use this question to highlight your strengths. Just focus on the task you like performing the most and explain why it’s your favorite.

Example
“If I had to select one step as a favorite, it would be analyzing the data. I enjoy developing a variety of hypotheses and searching for evidence to support or refute them. Sometimes, while following my analytical plan, I have stumbled upon interesting and unexpected learnings from the data. I believe there is always something to be learned from the data, whether big or small, that will help me in future analytical projects.”

What is an Affinity Diagram?

An Affinity Diagram is an analytical tool used to cluster or organize data into subgroups based on their relationships. These data or ideas are mostly generating from discussions or brainstorming sessions, and are used in analyzing complex issues.

What is the difference between data profiling and data mining?

Data Profiling focuses on analyzing individual attributes of data, thereby providing valuable information on data attributes such as data type, frequency, length, along with their discrete values and value ranges. On the contrary, data mining aims to identify unusual records, analyze data clusters, and sequence discovery, to name a few.

What is the difference between standardized and unstandardized coefficients?

The standardized coefficient is interpreted in terms of standard deviation while unstandardized coefficient is measured in actual values.

Mention the name of the framework developed by Apache for processing large dataset for an application in a distributed computing environment?

The complete Hadoop Ecosystem was developed for processing large dataset for an application in a distributed computing environment. The Hadoop Ecosystem consists of the following Hadoop components.

HDFS -> Hadoop Distributed File System
YARN -> Yet Another Resource Negotiator
MapReduce -> Data processing using programming
Spark -> In-memory Data Processing
PIG, HIVE-> Data Processing Services using Query (SQL-like)
HBase -> NoSQL Database
Mahout, Spark MLlib -> Machine Learning
Apache Drill -> SQL on Hadoop
Zookeeper -> Managing Cluster
Oozie -> Job Scheduling
Flume, Sqoop -> Data Ingesting Services
Solr & Lucene -> Searching & Indexing
Ambari -> Provision, Monitor and Maintain cluster

Explain what is correlogram analysis?

A correlogram analysis is the common form of spatial analysis in geography. It consists of a series of estimated autocorrelation coefficients calculated for a different spatial relationship. It can be used to construct a correlogram for distance-based data, when the raw data is expressed as distance rather than values at individual points.

Mention what are the various steps in an analytics project?

Various steps in an analytics project include

Problem definition
Data exploration
Data preparation
Modelling
Validation of data
Implementation and tracking

How will you handle the QA process when developing a predictive model to forecast customer churn?

Data analysts require inputs from the business owners and a collaborative environment to operationalize analytics. To create and deploy predictive models in production there should be an effective, efficient and repeatable process. Without taking feedback from the business owner, the model will just be a one-and-done model.

The best way to answer this question would be to say that you would first partition the data into 3 different sets Training, Testing and Validation. You would then show the results of the validation set to the business owner by eliminating biases from the first 2 sets. The input from the business owner or the client will give you an idea on whether you model predicts customer churn with accuracy and provides desired results.