Machine Learning Interview Questions – Set 17

What is Kernel SVM?

Kernel SVM is the abbreviated version of the kernel support vector machine. Kernel methods are a class of algorithms for pattern analysis, and the most common one is the kernel SVM.

What are 3 data preprocessing techniques to handle outliers?

  • Winsorize (cap at threshold).
  • Transform to reduce skew (using Box-Cox or similar).
  • Remove outliers if you’re certain they are anomalies or measurement errors.

How would you define the number of clusters in a clustering algorithm?

The number of clusters can be determined by finding the silhouette score. Often we aim to get some inferences from data using clustering techniques so that we can have a broader picture of a number of classes being represented by the data. In this case, the silhouette score helps us determine the number of cluster centres to cluster our data along.

Another technique that can be used is the elbow method.

Keeping train and test split criteria in mind, is it good to perform scaling before the split or after the split?

Scaling should be done post-train and test split ideally. If the data is closely packed, then scaling post or pre-split should not make much difference.

List all assumptions for data to be met before starting with linear regression.

Before starting linear regression, the assumptions to be met are as follow:

  • Linear relationship
  • Multivariate normality
  • No or little multicollinearity
  • No auto-correlation
  • Homoscedasticity

Why is mean square error a bad measure of model performance? What would you suggest instead?

Mean Squared Error (MSE) gives a relatively high weight to large errors — therefore, MSE tends to put too much emphasis on large deviations. A more robust alternative is MAE (mean absolute deviation).

Explain false negative, false positive, true negative and true positive with a simple example.

Let’s consider a scenario of a fire emergency:

True Positive: If the alarm goes on in case of a fire.
Fire is positive and prediction made by the system is true.
False Positive: If the alarm goes on, and there is no fire.
System predicted fire to be positive which is a wrong prediction, hence the prediction is false.
False Negative: If the alarm does not ring but there was a fire.
System predicted fire to be negative which was false since there was fire.
True Negative: If the alarm does not ring and there was no fire.
The fire is negative and this prediction was true.

A jar has 1000 coins, of which 999 are fair and 1 is double headed. Pick a coin at random, and toss it 10 times. Given that you see 10 heads, what is the probability that the next toss of that coin is also a head?

  • There are two ways of choosing a coin. One is to pick a fair coin and the other is to pick the one with two heads.
  • Probability of selecting fair coin = 999/1000 = 0.999
  • Probability of selecting unfair coin = 1/1000 = 0.001
    Selecting 10 heads in a row = Selecting fair coin * Getting 10 heads + Selecting an unfair coin
    P (A) = 0.999 * (1/2)^10 = 0.999 * (1/1024) = 0.000976
  • P (B) = 0.001 * 1 = 0.001
  • P( A / A + B ) = 0.000976 / (0.000976 + 0.001) = 0.4939
  • P( B / A + B ) = 0.001 / 0.001976 = 0.5061
  • Probability of selecting another head = P(A/A+B) * 0.5 + P(B/A+B) * 1 = 0.4939 * 0.5 + 0.5061 = 0.7531

What are the two paradigms of ensemble methods?

The two paradigms of ensemble methods are

  • Sequential ensemble methods
  • Parallel ensemble methods

What’s a Fourier transform?

Fourier Transform is a mathematical technique that transforms any function of time to a function of frequency. Fourier transform is closely related to Fourier series. It takes any time-based pattern for input and calculates the overall cycle offset, rotation speed and strength for all possible cycles. Fourier transform is best applied to waveforms since it has functions of time and space. Once a Fourier transform applied on a waveform, it gets decomposed into a sinusoid.