Data Analytics Interview Questions – Set 11

How would you assess your writing skills? When do you use written form of communication in your role as a data analyst?

Working with numbers is not the only aspect of a data analyst job. Data analysts also need strong writing skills, so they can present the results of their analysis to management and stakeholders efficiently. If you think you are not the greatest data “storyteller”, make sure you’re making efforts in that direction, e.g. through additional training.

Example
“Over time, I’ve had plenty of opportunities to enhance my writing skills, be it through email communication with coworkers, or through writing analytical project summaries for the upper management. I believe I can interpret data in a clear and succinct manner. However, I’m constantly looking for ways to improve my writing skills even further.”

If you are given an unsorted data set, how will you read the last observation to a new dataset?

We can read the last observation to a new dataset using end = dataset option.

For example:

  1. data example.newdataset;
  2. set example.olddataset end=last;
  3. If last;
  4. run;
    Where newdataset is a new data set to be created and olddataset is the existing data set. last is the temporary variable (initialized to 0) which is set to 1 when the set statement reads the last observation.

Explain what you do with suspicious or missing data?

When there is a doubt in data or there is missing data, then:

  • Make a validation report to provide information on the suspected data.
  • Have an experienced personnel look at it so that its acceptability can be determined.
  • Invalid data should be updated with a validation code.
  • Use the best analysis strategy to work on the missing data like simple imputation, deletion method or case wise imputation.

Explain what is the criteria for a good data model?

Criteria for a good data model includes

  • It can be easily consumed
  • Large data changes in a good model should be scalable
  • It should provide predictable performance
  • A good model can adapt to changes in requirements

What’s your experience in creating dashboards? Can you share what tools you’ve used for the purpose?

Dashboards are essential for managers, as they visually capture KPIs and metrics and help them track business goals. That said, data analysts are often involved in both building and updating dashboards. Some of the best tools for the purpose are Excel, Tableau, and Power BI (so make sure you’ve got a good command of those). When you talk about your experience, outline the types of data visualizations, and metrics you used in your dashboard.

Example
“In my line of work. I’ve created dashboards related to customer analytics in both Power BI and Excel. That means I used marketing metrics, such as brand awareness, sales, and customer satisfaction. To visualize the data, I operated with pie charts, bar graphs, line graphs, and tables.”

There are 3 mislabeled jars with Black and White balls in the first and the second jar respectively. The third jar contains a mixture of white and black balls. Now, you can pick as many balls as required to label each jar correctly. Tell the minimum number of balls to be picked up in this process of labeling the jars.

If you notice the condition in the question, you will observe that there is a circular misplacement. By which I mean that, if Black is wrongly labeled as Black, Black cannot be labeled as White. So, it must be named as Back + White. If you consider that all the 3 jars are wrongly placed, that is, Black + White jar contains either the Black balls or the White balls, but not the both. Now, just assume you pick one ball from the Black + White jar and let us assume it to be a Black ball. So, obviously, you will name the jar as Black. However, the jar labeled Black cannot have Black + White. Thus, the third jar left in the process should be labeled Black + White. So, if you just pick up one ball, you can correctly label the jars.

What is “Clustering?” Name the properties of clustering algorithms.

Clustering is a method in which data is classified into clusters and groups. A clustering algorithm has the following properties:

  • Hierarchical or flat
  • Hard and soft
  • Iterative
  • Disjunctive

What are some of the most popular tools used in data analytics?

The most popular tools used in data analytics are:

  • Tableau
  • Google Fusion Tables
  • Google Search Operators
  • Konstanz Information Miner (KNIME)
  • RapidMiner
  • Solver
  • OpenRefine
  • NodeXL
  • Io
  • Pentaho
  • SQL Server Reporting Services (SSRS)
  • Microsoft data management stack

What is a Print Area and how can you set it in Excel?

A Print Area in Excel is a range of cells that you designate to print whenever you print that worksheet. For example, if you just want to print the first 20 rows from the entire worksheet, then you can set the first 20 rows as the Print Area.

Now, to set the Print Area in Excel, you can follow the below steps:

  • Select the cells for which you want to set the Print Area.
  • Then, click on the Page Layout Tab.
  • Click on Print Area.
  • Click on Set Print Area.

Mention what is the difference between data mining and data profiling?

The difference between data mining and data profiling is that

Data profiling: It targets on the instance analysis of individual attributes. It gives information on various attributes like value range, discrete value and their frequency, occurrence of null values, data type, length, etc.

Data mining: It focuses on cluster analysis, detection of unusual records, dependencies, sequence discovery, relation holding between several attributes, etc.