Machine Learning Interview Questions | Eklavya Online

Machine Learning Interview Questions

How to check if the regression model fits the data well?

There are a couple of metrics that you can use: R-squared/Adjusted R-squared: Relative measure of fit. This was explained in a previous answer F1 Score: Evaluates the null hypothesis that all regression coefficients are equal to zero vs the alternative hypothesis that at least one doesn’t equal zero RMSE: Absolute measure of fit.

What is collinearity and what to do with it? How to remove multicollinearity?

Multicollinearity exists when an independent variable is highly correlated with another independent variable in a multiple regression equation. This can be problematic because it undermines the statistical significance of an independent variable. You could use the Variance Inflation Factors (VIF) to determine if there is any multicollinearity between independent variables — a standard benchmark is …

What is collinearity and what to do with it? How to remove multicollinearity? Read More »

What are the assumptions required for linear regression? What if some of these assumptions are violated?

The assumptions are as follows: The sample data used to fit the model is representative of the population The relationship between X and the mean of Y is linear The variance of the residual is the same for any value of X (homoscedasticity) Observations are independent of each other For any value of X, Y …

What are the assumptions required for linear regression? What if some of these assumptions are violated? Read More »

What are the drawbacks of a linear model?

There are a couple of drawbacks of a linear model: A linear model holds some strong assumptions that may not be true in application. It assumes a linear relationship, multivariate normality, no or little multicollinearity, no auto-correlation, and homoscedasticity A linear model can’t be used for discrete or binary outcomes. You can’t vary the model …

What are the drawbacks of a linear model? Read More »

What is principal component analysis? Explain the sort of problems you would use PCA for.

In its simplest sense, PCA involves project higher dimensional data (eg. 3 dimensions) to a smaller space (eg. 2 dimensions). This results in a lower dimension of data, (2 dimensions instead of 3 dimensions) while keeping all original variables in the model. PCA is commonly used for compression purposes, to reduce required memory and to …

What is principal component analysis? Explain the sort of problems you would use PCA for. Read More »

When would you use random forests Vs SVM and why?

There are a couple of reasons why a random forest is a better choice of model than a support vector machine: Random forests allow you to determine the feature importance. SVM’s can’t do this. Random forests are much quicker and simpler to build than an SVM. For multi-class classification problems, SVMs require a one-vs-rest method, …

When would you use random forests Vs SVM and why? Read More »