Machine Learning Interview Questions | Eklavya Online

Machine Learning Interview Questions

Q10. You are working on a time series data set. Your manager has asked you to build a high accuracy model. You start with the decision tree algorithm since you know it works fairly well on all kinds of data. Later, you tried a time series regression model and got higher accuracy than the decision tree model. Can this happen? Why?

Time series data is based on linearity while a decision tree algorithm is known to work best to detect non-linear interactions Decision tree fails to provide robust predictions. Why? The reason is that it couldn’t map the linear relationship as good as a regression model did. We also know that a linear regression model can …

Q10. You are working on a time series data set. Your manager has asked you to build a high accuracy model. You start with the decision tree algorithm since you know it works fairly well on all kinds of data. Later, you tried a time series regression model and got higher accuracy than the decision tree model. Can this happen? Why? Read More »

Executing a binary classification tree algorithm is a simple task. But, how does a tree splitting take place? How does the tree determine which variable to break at the root node and which at its child nodes?

Gini index and Node Entropy assist the binary classification tree to take decisions. Basically, the tree algorithm determines the feasible feature that is used to distribute data into the most genuine child nodes. According to Gini index, if we arbitrarily pick a pair of objects from a group, then they should be of identical class …

Executing a binary classification tree algorithm is a simple task. But, how does a tree splitting take place? How does the tree determine which variable to break at the root node and which at its child nodes? Read More »

I know that a linear regression model is generally evaluated using Adjusted R² or F value. How would you evaluate a logistic regression model?

: We can use the following methods: Since logistic regression is used to predict probabilities, we can use AUC-ROC curve along with confusion matrix to determine its performance. Also, the analogous metric of adjusted R² in logistic regression is AIC. AIC is the measure of fit which penalizes model for the number of model coefficients. …

I know that a linear regression model is generally evaluated using Adjusted R² or F value. How would you evaluate a logistic regression model? Read More »

What is a Box-Cox transformation?

Box-Cox transformation is a power transform which transforms non-normal dependent variables into normal variables as normality is the most common assumption made while using many statistical techniques. It has a lambda parameter which when set to 0 implies that this transform is equivalent to log-transform. It is used for variance stabilization and also to normalize …

What is a Box-Cox transformation? Read More »

What cross validation technique would you use on time series data set? Is it k-fold or LOOCV?

Neither. In time series problem, k fold can be troublesome because there might be some pattern in year 4 or 5 which is not in year 3. Resampling the data set will separate these trends, and we might end up validation on past years, which is incorrect. Instead, we can use forward chaining strategy with …

What cross validation technique would you use on time series data set? Is it k-fold or LOOCV? Read More »

You are given a data set. The data set contains many variables, some of which are highly correlated and you know about it. Your manager has asked you to run PCA. Would you remove correlated variables first? Why?

Chances are, you might be tempted to say No, but that would be incorrect. Discarding correlated variables have a substantial effect on PCA because, in presence of correlated variables, the variance explained by a particular component gets inflated. For example: You have 3 variables in a data set, of which 2 are correlated. If you …

You are given a data set. The data set contains many variables, some of which are highly correlated and you know about it. Your manager has asked you to run PCA. Would you remove correlated variables first? Why? Read More »

How does XML and CSVs compare in terms of size?

In practice, XML is much more verbose than CSVs are and takes up a lot more space. CSVs use some separators to categorize and organize data into neat columns. XML uses tags to delineate a tree-like structure for key-value pairs. You’ll often get XML back as a way to semi-structure data from APIs or HTTP …

How does XML and CSVs compare in terms of size? Read More »

Why rotation is required in PCA? What will happen if you don’t rotate the components?

Rotation is a significant step in PCA as it maximizes the separation within the variance obtained by components. Due to this, the interpretation of components becomes easier. The motive behind doing PCA is to choose fewer components that can explain the greatest variance in a dataset. When rotation is performed, the original coordinates of the …

Why rotation is required in PCA? What will happen if you don’t rotate the components? Read More »