## What is the standard approach to supervised learning?

The standard approach to supervised learning is to split the set of example into the training set and the test.

Skip to content
# Machine Learning Interview Questions

## What is the standard approach to supervised learning?

## Q10. You are working on a time series data set. Your manager has asked you to build a high accuracy model. You start with the decision tree algorithm since you know it works fairly well on all kinds of data. Later, you tried a time series regression model and got higher accuracy than the decision tree model. Can this happen? Why?

## Executing a binary classification tree algorithm is a simple task. But, how does a tree splitting take place? How does the tree determine which variable to break at the root node and which at its child nodes?

## I know that a linear regression model is generally evaluated using Adjusted R² or F value. How would you evaluate a logistic regression model?

## What is a Box-Cox transformation?

## What cross validation technique would you use on time series data set? Is it k-fold or LOOCV?

## You are given a data set. The data set contains many variables, some of which are highly correlated and you know about it. Your manager has asked you to run PCA. Would you remove correlated variables first? Why?

## How does XML and CSVs compare in terms of size?

## What are two techniques of Machine Learning ?

## Why rotation is required in PCA? What will happen if you don’t rotate the components?

The standard approach to supervised learning is to split the set of example into the training set and the test.

Time series data is based on linearity while a decision tree algorithm is known to work best to detect non-linear interactions Decision tree fails to provide robust predictions. Why? The reason is that it couldn’t map the linear relationship as good as a regression model did. We also know that a linear regression model can …

Gini index and Node Entropy assist the binary classification tree to take decisions. Basically, the tree algorithm determines the feasible feature that is used to distribute data into the most genuine child nodes. According to Gini index, if we arbitrarily pick a pair of objects from a group, then they should be of identical class …

: We can use the following methods: Since logistic regression is used to predict probabilities, we can use AUC-ROC curve along with confusion matrix to determine its performance. Also, the analogous metric of adjusted R² in logistic regression is AIC. AIC is the measure of fit which penalizes model for the number of model coefficients. …

Box-Cox transformation is a power transform which transforms non-normal dependent variables into normal variables as normality is the most common assumption made while using many statistical techniques. It has a lambda parameter which when set to 0 implies that this transform is equivalent to log-transform. It is used for variance stabilization and also to normalize …

Neither. In time series problem, k fold can be troublesome because there might be some pattern in year 4 or 5 which is not in year 3. Resampling the data set will separate these trends, and we might end up validation on past years, which is incorrect. Instead, we can use forward chaining strategy with …

Chances are, you might be tempted to say No, but that would be incorrect. Discarding correlated variables have a substantial effect on PCA because, in presence of correlated variables, the variance explained by a particular component gets inflated. For example: You have 3 variables in a data set, of which 2 are correlated. If you …

In practice, XML is much more verbose than CSVs are and takes up a lot more space. CSVs use some separators to categorize and organize data into neat columns. XML uses tags to delineate a tree-like structure for key-value pairs. You’ll often get XML back as a way to semi-structure data from APIs or HTTP …

The two techniques of Machine Learning are Genetic Programming Inductive Learning

Rotation is a significant step in PCA as it maximizes the separation within the variance obtained by components. Due to this, the interpretation of components becomes easier. The motive behind doing PCA is to choose fewer components that can explain the greatest variance in a dataset. When rotation is performed, the original coordinates of the …

Why rotation is required in PCA? What will happen if you don’t rotate the components? Read More »