R Interview Questions – Set 05

What are the advantages and disadvantages of R?

Advantages

  • Open Source
  • Data Wrangling
  • Array of Packages
  • Platform Independent
  • Machine Learning Operations
    Disadvantages
  • Weak origin
  • Data Handling
  • Basic Security
  • Complicated Language
  • Lesser Speed

The with() function applies an expression to a dataset, and the by() function applies a function to each level of factors.

The lapply is used to show the output in the form of the list, whereas sapply is used to show the output in the form of a vector or data frame.

Explain the lattice package.

The lattice package is meant to improve upon the base R graphics by giving better defaults and has the ability to display multivariate relationships easily.

Define cluster.stats() and pvclust() function().

The cluster.stats() function define in the fpc package that provides a method for comparing the similarity of two cluster solutions using different validation criteria, and the pvclust() function is defined in the pvclust package that provides p-values for hierarchical clustering.

Differentiate b/w “%%” and “%/%”.

The “%%” provides a reminder of the division of the first vector with the second, and the “%/%” gives the quotient of the division of the first vector with the second.

What is a White Noise model?

It is a basic time series model and a simple example of a stationary process. A white noise model has a fixed constant mean, a fixed constant variance, and no correlation over time.

Explain Time Series Analysis.

Any metric which is measured over regular time intervals creates a time series. Analysis of time series is commercially important due to industrial necessity and relevance, especially with respect to the forecasting (demand, supply, and sale, etc.). A series of data points in which each data point is associated with a timestamp is known as time series.

Explain mashapiro.test() and barlett.test().

This function defines in the mvnormtest package and produces the Shapiro-wilk test to multivariate normality. The barlett.test() is used to provide a parametric k-sample test of the equality of variances.

What is the purpose behind R and Hadoop integration?

  • For executing Hadoop to execute R code.
  • For using R to access the data stored in Hadoop.

What is R?

R is an interpreted computer programming language which was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand”. It is a software environment used to analyze statistical information, graphical representation, reporting, and data modeling. R is the implementation of the S programming language, which is combined with lexical scoping semantics.