What is a depth-first search algorithm?
Depth-first search (DFS) is based on LIFO (last-in, first-out). A recursion is implemented with LIFO stack data structure. Thus, the nodes are in a different order than in BFS. The path is stored in each iteration from root to leaf nodes in a linear fashion with space requirement.
How to install TensorFlow?
TensorFlow Installation Guide:
CPU : pip install tensorflow-cpu
GPU : pip install tensorflow-gpu
What is alternate, artificial, compound and natural key?
Alternate Key: Excluding primary keys all candidate keys are known as Alternate Keys.
Artificial Key: If no obvious key either stands alone or compound is available, then the last resort is to, simply create a key, by assigning a number to each record or occurrence. This is known as artificial key.
Compound Key: When there is no single data element that uniquely defines the occurrence within a construct, then integrating multiple elements to create a unique identifier for the construct is known as Compound Key.
Natural Key: Natural key is one of the data element that is stored within a construct, and which is utilized as the primary key.
What are intelligent agents?
An intelligent agent is an autonomous entity that leverages sensors to understand a situation and make decisions. It can also use actuators to perform both simple and complex tasks.
In the beginning, it might not be so great at performing a task, but it will improve over time.
What is the lifetime of a variable?
When we first run the tf.Variable.initializer operation for a variable in a session, it is started. It is destroyed when we run the tf.Session.close operation.
What is a heuristic function?
A heuristic function ranks alternatives, in search algorithms, at each branching step based on the available information to decide which branch to follow.
How does data overfitting occur and how can it be fixed?
Overfitting can be prevented by using the following methodologies:
Cross-validation: The idea behind cross-validation is to split the training data in order to generate multiple mini train-test splits. These splits can then be used to tune your model.
More training data: Feeding more data to the machine learning model can help in better analysis and classification. However, this does not always work.
Remove features: Many times, the data set contains irrelevant features or predictor variables that are not needed for analysis. Such features only increase the complexity of the model, thus leading to possibilities of data overfitting. Therefore, such redundant variables must be removed.
Early stopping: A machine learning model is trained iteratively, this allows us to check how well each iteration of the model performs. But after a certain number of iterations, the model’s performance starts to saturate. Further training will result in overfitting, thus one must know where to stop the training. This can be achieved by a mechanism called early stopping.
Regularization: Regularization can be done in n number of ways, the method will depend on the type of learner you’re implementing. For example, pruning is performed on decision trees, the dropout technique is used on neural networks and parameter tuning can also be applied to solve overfitting issues.
Use Ensemble models: Ensemble learning is a technique that is used to create multiple Machine Learning models, which are then combined to produce more accurate results. This is one of the best ways to prevent overfitting. An example is Random Forest, it uses an ensemble of decision trees to make more accurate predictions and to avoid overfitting.
In Python’s standard library, what packages would you say are the most useful for data scientists?
Python wasn’t built for data science. However, in recent years it has grown to become the go-to programming language for the following:
- Machine learning
- Predictive analytics
- Simple data analytics
- Statistics
For data science projects, the following packages in the Python standard library will make life easier and accelerate deliveries:
NumPy (to process large multidimensional arrays, extensive collections of high-level mathematical functions, and matrices)
Pandas (to leverage built-in methods for rapidly combining, filtering, and grouping data)
SciPy (to extend NumPy’s capabilities and solve tasks related to integral calculus, linear algebra, and probability theory)
To answer any query how the Bayesian network can be used?
If a Bayesian Network is a representative of the joint distribution, then by summing all the relevant joint entries, it can solve any query.
List the extraction techniques used for dimensionality reduction.
- Independent component analysis
- Principal component analysis
- Kernel-based principal component analysis