Data Analytics Interview Questions – Set 03

What is required to become a data analyst?

To become a data analyst,

  • Robust knowledge on reporting packages (Business Objects), programming language (XML, Javascript, or ETL frameworks), databases (SQL, SQLite, etc.)
  • Strong skills with the ability to analyze, organize, collect and disseminate big data with accuracy
  • Technical knowledge in database design, data models, data mining and segmentation techniques
  • Strong knowledge on statistical packages for analyzing large datasets (SAS, Excel, SPSS, etc.)

What is data cleansing? Mention few best practices that you have followed while data cleansing.

From a given dataset for analysis, it is extremely important to sort the information required for data analysis. Data cleaning is a crucial step in the analysis process wherein data is inspected to find any anomalies, remove repetitive data, eliminate any incorrect information, etc. Data cleansing does not involve deleting any existing information from the database, it just enhances the quality of data so that it can be used for analysis.
Some of the best practices for data cleansing include –

  • Developing a data quality plan to identify where maximum data quality errors occur so that you can assess the root cause and design the plan according to that.
  • Follow a standard process of verifying the important data before it is entered into the database.
  • Identify any duplicates and validate the accuracy of the data as this will save lot of time during analysis.
  • Tracking all the cleaning operations performed on the data is very important so that you repeat or remove any operations as necessary.

What is the difference between NVL and NVL2 functions in SQL?

NVL(exp1, exp2) and NVL2(exp1, exp2, exp3) are functions which check whether the value of exp1 is null or not.

If we use NVL(exp1,exp2) function, then if exp1 is not null, then the value of exp1 will be returned; else the value of exp2 will be returned. But, exp2 must be of the same data type of exp1.

Similarly, if we use NVL2(exp1, exp2, exp3) function, then if exp1 is not null, exp2 will be returned, else the value of exp3 will be returned.

Explain what should be done with suspected or missing data?

  • Prepare a validation report that gives information of all suspected data. It should give information like validation criteria that it failed and the date and time of occurrence
  • Experience personnel should examine the suspicious data to determine their acceptability
  • Invalid data should be assigned and replaced with a validation code
  • To work on missing data use the best analysis strategy like deletion method, single imputation methods, model based methods, etc.

Define “Time Series Analysis”.

Series analysis can usually be performed in two domains – time domain and frequency domain.
Time series analysis is the method where the output forecast of a process is done by analyzing the data collected in the past using techniques like exponential smoothening, log-linear regression method, etc.

What Two Steps Are Performed During the Data Validation Process?

You should easily be able to demonstrate to your interviewer that you know and understand these steps, so be prepared for this question if you are asked. Be sure to not only answer with the two different steps—data validation and data verification—but also how they are performed.

Which area would you prefer to work in and why?

There are many different types of data analyst, including operations analysts, marketing analystsfinancial analysts, and more. Explain which type you prefer. Be specific in your answer to indicate to the interviewer that you’ve done your research.

You might answer something like this:

“I would prefer to work as a marketing analyst because it’s in line with my skills and interests. In addition, I have seen that the companies who hire for this role work in industries that are booming and can therefore provide good career growth.

A car travels a distance of 60 miles at an average speed of 30 miles per hour. How fast does the car need to travel on the way back (taking the same road) in order to average 40 miles per hour over the course of the entire trip?

You need to build the following equation:

The total distance that needs to be traveled both ways is 120 miles. The average speed that we need to obtain is 40 miles; therefore, the car must travel for 3 hours in order to achieve that:

120 miles/40 miles per hour = 3 hours

The car has already traveled for two hours:

60 miles/30 miles per hour = 2 hours

So, on the way back it needs to travel only 1 hour. The distance is 60 miles. Hence the car needs to travel at 60 miles per hour.

What are different types of Hypothesis Testing?

The different types of hypothesis testing are as follows:

  • T-test: T-test is used when the standard deviation is unknown and the sample size is comparatively small.
  • Chi-Square Test for Independence: These tests are used to find out the significance of the association between categorical variables in the population sample.
  • Analysis of Variance (ANOVA): This kind of hypothesis testing is used to analyze differences between the means in various groups. This test is often used similarly to a T-test but, is used for more than two groups.
  • Welch’s T-test: This test is used to find out the test for equality of means between two population samples.

What are the data validation methods used in data analytics?

The various types of data validation methods used are:

  • Field Level Validation – validation is done in each field as the user enters the data to avoid errors caused by human interaction.
  • Form Level Validation – In this method, validation is done once the user completes the form before a save of the information is needed.
  • Data Saving Validation – This type of validation is performed during the saving process of the actual file or database record. This is usually done when there are multiple data entry forms.
  • Search Criteria Validation – This type of validation is relevant to the user to match what the user is looking for to a certain degree. It is to ensure that the results are actually returned.