Training Machine Learning model using Ensemble approach#

Ensemble learning is a way to combine multiple ML methods, and to base the final answer on the outputs of these classifiers. A good description of ensemble learning is here.

[Source: Patheos.com](https://www.patheos.com/blogs/driventoabstraction/2018/07/blind-men-elephant-folklore-knowledge/)

Two examples of ensemble approach: bagging and boosting.

image

Bagging creates a series of training sets from the originak training set with the procedure called bootstrapping. The bootstrapped sets are random samples (with replacement) of the observations in the original training set. These sets have the same number of observations as the original training set. Then, each set is processed with a machine learning model. The final outcome is the average output (for regression) or the majority vote (for classification). This combination is normally more robust than a single model.

Some implementations of Bagging in Caret:

  • ctreebag: used for Decision Tree

  • bagFDA: used for Flexible Discriminant Analysis

  • ldaBag: Bagging for Linear Discriminant Analysis

  • plsBag: Bagging for Principal Linear Regression

ModFit_bag <- train(as.factor(Species) ~ .,data=training,
                   method="treebag",
                   importance=TRUE)
predict_bag <- predict(ModFit_bag,testing)
confusionMatrix(predict_bag, testing$Species)
plot(varImp(ModFit_bag))

In boosting, this process is sequential rather than parallel: output of one model is the input to another one. The inputs are weighted: if an observation is misclassified, it will be weighted more highly for the next classifier.