Data Analytics Interview Questions – Set 14

Explain the difference between R-Squared and Adjusted R-Squared.

The R-Squared technique is a statistical measure of the proportion of variation in the dependent variables, as explained by the independent variables. The Adjusted R-Squared is essentially a modified version of R-squared, adjusted for the number of predictors in a model. It provides the percentage of variation explained by the specific independent variables that have a direct impact on the dependent variables.

How many X are in Y place?

This question takes many forms, but the premise of it is quite simple. It’s asking you to work through a mathematical problem, usually figuring out the number of an item in a certain place, or figuring out how much of something could potentially be sold somewhere. Here are some real examples from Glassdoor:

  • “How many piano tuners are in the city of Chicago?” (Quicken Loans)
  • “How many windows are in New York City, by you estimation?” (Petco)
  • “How many gas stations are there in the United States?” (Progressive)
    The idea here is to put you in a situation where you can’t possibly know something off the top of your head, but to see you work through it anyway. That’s the trap, though. You don’t want to just give up and say, well, gee, I don’t know. As James Patounas, associate director and senior data analyst at Source One, puts it, “I have been asked something similar as well as asked something similar. I personally would not accept ‘you can’t really know’ as an answer; or, at least, I would not hire someone that thought this was a sufficient answer.”

He went on: “Mathematical modeling is typically an approximation of the real world. It is rarely an exact representation.”

Basically, you want to pull the data you do have, or at least can approximate, and work yourself through a solution. Let’s take the number of windows in New York City as an example for the sample answer below.

Note: Figures in this answer do not necessarily realistically reflect facts; they are approximations (there are actually 8.6 million people in NYC, according to 2017 data, for example).

I believe there are about 10 million people in New York, give or take a couple million. Assuming each of them lives in a residential building, with three rooms or more, if there were one window per room, that would make approximately 30 million windows. I’m making a few different assumptions that are probably inaccurate. For instance, that everyone lives alone and that the average size of their residences is just three rooms with one window per room. Obviously, there will be a lot of variations in reality. But I think, in terms of residences, 30 million windows could be close.

Then you’d have to take windows for businesses, subway rail cars, and personal vehicles. If the average subway car seats 1,000 people, with 1 window per 2 seats, that’s 500 windows per car. A little more math: I’d guess there are at least enough subway cars to support the whole population of New York: so 10 million divided by 1,000 comes out to 10,000. So there are another 5 million windows for subway cars. If half of all people own their own vehicle, that’s another six windows per person, so 30 million more windows. I’d guess there are at least 100,000 businesses with windows in NYC. Let’s just say for the sake of argument there’s an average of 10 windows each. That’s another million. I’m sure there’s way more than that.

Overall, we’re at 66 million windows (30,000,000 x 2 + 5,000,000 + 1,000,000). All of this pretty much hinges on how close I am to the actual population of New York City. Also, there are other places to find windows, such as busses or boats. But that’s a start.

What is your experience in conducting presentations to various audiences?

Strong presentation skills are extremely valuable for any data analyst. Employers are looking for candidates who not only possess brilliant analytical skills, but also have the confidence and eloquence to present their results to different audiences, including upper-level management and executives, and non-technical coworkers. So, when talking about the audiences you’ve presented to, make sure you mention the following:

  • Size of the audience;
  • Whether it included executives;
  • Departments and background of the audience;
  • Whether the presentation was in person or remote, as the latter can be very challenging.
    Example
    “In my role as a Data Analyst, I have presented to various audiences made up of coworkers and clients with differing backgrounds. I’ve given presentation to both small and larger groups. I believe the largest so far has been around 30 people, mostly colleagues from non-technical departments. All of these presentations were conducted in person, except for 1 which was remote via video conference call with senior management.”

What is the default port for SQL?

The default TCP port assigned by the official Internet Number Authority(IANA) for SQL server is 1433.

Mention the name of the framework developed by Apache for processing large data set for an application in a distributed computing environment?

Hadoop and MapReduce is the programming framework developed by Apache for processing large data set for an application in a distributed computing environment.

What is the difference between true positive rate and recall?

There is no difference, they are the same, with the formula:

(true positive)/(true positive + false negative)

What Are the Main Responsibilities of a Data Analyst?

It is important to be able to define the role you’re interviewing for clearly. Some of the different responsibilities of a data analyst you can use in your response include: analyzing all information related to data, creating business reports with data, and identifying areas that need improvement.

Tell me about a time you and your team were surprised by the results of a project.

When starting an analysis, most data analysts have a rough prediction of the outcome rested on findings from previous projects. But there’s always room for surprise, and sometimes the results are completely unexpected. This question gives you a chance to talk about the types of analytical projects you’ve been involved in. Plus, it allows you to demonstrate your excitement about drawing new learnings from your projects. And don’t forget to mention the action you and the stakeholders took as a result of the unexpected outcome.

Example
“While performing routine analysis of a customer database, I was completely surprised to discover a customer subsegment that the company could target with a new suitable product and a relevant message. That presented a great opportunity for additional revenue for the company by utilizing a subset of an existing customer base. Everyone on my team was pleasantly surprised and soon enough we began devising strategies with Product Development to address the needs of this newly discovered subsegment.”

Consider 10 stacks of 10 coins each, where each coin weighs 10 grams. But, one of the 10 stacks is defective, and this defective stack contains the coins of 9 grams each. Find the minimum number of weights needed to identify the defective stack.

The solution to this puzzle is very simple. You just must pick 1 coin from the 1st stack, 2 coins from the 2nd stack, 3 coins from the 3rd stack and so on till 10 coins from the 10th stack. So, if you add the number of coins then it would be equal to 55.

So, if none of the coins are defective then the weight would 55*10 = 550 grams.

Yet, if stack 1 turns out to be defective, then the total weight would be 1 less then 550 grams, that is 549 grams. Similarly, if stack 2 was defective then the total weight would be equal to 2 less than 50 grams, that is 548 grams. Similarly, you can find for the other 8 cases.

So, just one measurement is needed to identify the defective stack.

Define “Collaborative Filtering”

Collaborative filtering is an algorithm that creates a recommendation system based on the behavioral data of a user. For instance, online shopping sites usually compile a list of items under “recommended for you” based on your browsing history and previous purchases. The crucial components of this algorithm include users, objects, and their interest.