Data Analytics Interview Questions – Set 09

Mention what are the key skills required for Data Analyst?

A data scientist must have the following skills

  • Database knowledge
  • Database management
  • Data blending
  • Querying
  • Data manipulation
  • Predictive Analytics
  • Basic descriptive statistics
  • Predictive modeling
  • Advanced analytics
  • Big Data Knowledge
  • Big data analytics
  • Unstructured data analysis
  • Machine learning
  • Presentation skill
  • Data visualization
  • Insight presentation
  • Report design

What are the characteristics of a good data model?

For a data model to be considered as good and developed, it must depict the following characteristics:

  • It should have predictable performance so that the outcomes can be estimated accurately, or at least, with near accuracy.
  • It should be adaptive and responsive to changes so that it can accommodate the growing business needs from time to time.
  • It should be capable of scaling in proportion to the changes in data.
  • It should be consumable to allow clients/customers to reap tangible and profitable results.

Describe a time when you had to persuade others. How did you get buy-in?

The trick to this question is to demonstrate that you not only persuaded others of a decision, but that it was the right decision.

As a data analyst intern at my last company, we didn’t really have a modern means of transferring files between coworkers. We used flash drives. It took some work, but eventually I convinced my manager to let me research file-sharing services that would work best for our team. We tried Google Drive and Dropbox, but eventually we settled on using Sharepoint drives because it integrated well with some of the software we were already using on a daily basis, especially Excel. It definitely improved productivity and minimized the wasted time searching for who had what files at what times.

Can you share details about the largest data set you’ve worked with? How many entries and variables did the data set comprise? What kind of data was included?

Working with large datasets and dealing with a substantial number of variables and columns is important for a lot of hiring managers. When answering the question, you don’t have to reveal background information about the project or how you managed each stage. Focus on the size and type of data.

Example Answer
“I believe the largest data set I’ve worked with was within a joint software development project. The data set comprised more than a million records and 600-700 variables. My team and I had to work with Marketing data which we later loaded into an analytical tool to perform EDA.”

What would be the result of the following SAS function (given that 31 Dec 2017 is Saturday)? Weeks = intck (‘week’,’31 dec 2017’d,’01jan2018’d); Years = intck (‘year’,’31 dec 2017’d,’01jan2018’d); Months = intck (‘month’,’31 dec 2017’d,’01jan2018’d);

Here, we will calculate the weeks between 31st December 2017 and 1st January 2018. 31st December 2017 was a Saturday. So 1st January 2018 will be a Sunday in the next week.

  • Hence, Weeks = 1 since both the days are in different weeks.
  • Years = 1 since both the days are in different calendar years.
  • Months = 1 since both the days are in different months of the calendar.

How does PROC SQL work?

PROC SQL is nothing but a simultaneous process for all the observations. The following steps occur when a PROC SQL gets executed:

  • SAS scans each and every statement in the SQL procedure and checks the syntax errors.
  • The SQL optimizer scans the query inside the statement. So, the SQL optimizer basically decides how the SQL query should be executed in order to minimize the runtime.
  • If there are any tables in the FROM statement, then they are loaded into the data engine where they can then be accessed in the memory.
  • Codes and Calculations are executed.
  • The Final Table is created in the memory.
  • The Final Table is sent to the output table described in the SQL statement.

Why is KNN used to determine missing numbers?

KNN is used for missing values under the assumption that a point value can be approximated by the values of the points that are closest to it, based on other variables.

Explain what is n-gram?

N-gram:

An n-gram is a contiguous sequence of n items from a given sequence of text or speech. It is a type of probabilistic language model for predicting the next item in such a sequence in the form of a (n-1).

Which Excel functions have you used on a regular basis so far? Can you describe in detail how you’ve used Excel as an analytical tool in your projects?

If you are an Excel expert, it would be difficult to list all the functions you have experience using. Instead, concentrate on highlighting the more difficult ones, particularly statistical functions. If you have experience utilizing the more challenging functions, hiring managers will presume you have experience using the more basic ones. Be sure to highlight your pivot table skills, as well as your ability to create graphs in Excel. If you have not attained these skills yet, it is worthwhile to invest in training to learn them.

If you’re an Excel pro, there is no need to recite each and every function you’ve used. Instead, highlight your advanced Excel skills, such as working with statistical functions, pivot tables, and graphs. Of course, if you lack the experience, it’s worth considering a specialized Excel training that will help you build a competitive skillset.

Example
“I think I’ve used Excel every day of my data analyst career in every single phase of my analytical projects. For example, I’ve checked, cleaned, and analyzed data sets using Pivot tables. I’ve also turned to statistical functions to calculate standard deviations, correlation coefficients, and others. Not to mention that the Excel graphing function is great for developing visual summaries of the data. As a case in point, I’ve worked with raw data from external vendors in many customer satisfaction surveys. First, I’d use sort functions and pivot tables to ensure the data was clean and loaded properly. In the analysis phase, I’d segment the data with pivot tables and the statistical functions, if necessary. Finally, I’d build tables and graphs for efficient visual representation.”

Can you tell how to embed views onto Web pages?

You can embed interactive Tableau views and dashboards into web pages, blogs, wiki pages, web applications, and intranet portals. Embedded views update as the underlying data changes, or as their workbooks are updated on Tableau Server. Embedded views follow the same licensing and permission restrictions used on Tableau Server. That is, to see a Tableau view that’s embedded in a web page, the person accessing the view must also have an account on Tableau Server.

Alternatively, if your organization uses a core-based license on Tableau Server, a Guest account is available. This allows people in your organization to view and interact with Tableau views embedded in web pages without having to sign in to the server. Contact your server or site administrator to find out if the Guest user is enabled for the site you publish to.

You can do the following to embed views and adjust their default appearance:

  • Get the embed code provided with a view: The Share button at the top of each view includes embedded code that you can copy and paste into your webpage. (The Share button doesn’t appear in embedded views if you change the showShareOptions parameter to false in the code.)
  • Customize the embed code: You can customize the embed code using parameters that control the toolbar, tabs, and more. For more information, see Parameters for Embed Code.
  • Use the Tableau JavaScript API: Web developers can use Tableau JavaScript objects in web applications. To get access to the API, documentation, code examples, and the Tableau developer community, see the Tableau Developer Portal.

Define Outlier

A data analyst interview question and answers guide will not complete without this question. An outlier is a term commonly used by data analysts when referring to a value that appears to be far removed and divergent from a set pattern in a sample. There are two kinds of outliers – Univariate and Multivariate.

The two methods used for detecting outliers are:

  • Box plot method – According to this method, if the value is higher or lesser than 1.5*IQR (interquartile range), such that it lies above the upper quartile (Q3) or below the lower quartile (Q1), the value is an outlier.
  • Standard deviation method – This method states that if a value is higher or lower than mean ± (3*standard deviation), it is an outlier.