Ensemble learning is a powerful approach in machine learning that combines multiple models to improve overall performance, accuracy, and robustness beyond what individual models can achieve alone.
By aggregating predictions from a diverse set of models, ensemble methods reduce errors caused by bias, variance, or noise and better capture the underlying data patterns.
Common ensemble techniques include bagging, boosting, and stacking, each leveraging different principles to build stronger predictive models.
Ensemble learning improves model predictions by leveraging the collective wisdom of multiple learners rather than relying on a single model.
It works on the idea that a group of weak or diverse models combined can perform better than any single strong model, often resulting in lower generalisation error.

Bagging (Bootstrap Aggregating)
Bagging reduces variance and prevents overfitting by training multiple models independently on different random subsets (bootstrap samples) of the training data.
1. Each model is trained on a randomly sampled dataset with replacement
2. Final prediction is aggregated by voting (classification) or averaging (regression)
Common example: Random Forest, an ensemble of decision trees with feature randomness
Advantages: Reduces overfitting by averaging models, and works well for high-variance models like decision trees
Limitations: Does not reduce bias significantly, and requires multiple independent models, increasing training time
Boosting sequentially trains models, where each new model focuses on correcting the errors of its predecessors. The models are combined with weighted voting/averaging to form a strong predictive model.
1. Initially fits a base learner; subsequent learners focus more on misclassified or hard examples
2. Common algorithms: AdaBoost, Gradient Boosting Machines (GBM), XGBoost, LightGBM, CatBoost
Advantages: Reduces both bias and variance, and creates strong models by emphasising difficult cases
Limitations: More sensitive to noisy data and outliers, and sequential training makes it computationally intensive
Stacking combines multiple base models (level-0) by training a meta-model (level-1) on their outputs (predictions), learning how best to combine them for optimal accuracy.
1. Base models are diverse (different algorithms or hyperparameters)
2. Meta-model learns weights and corrections on the predictions of base models
3. Utilises cross-validation to avoid overfitting in meta-model training
Advantages: Flexibly combines heterogeneous models, and often outperforms individual models and simpler ensembles
Limitations: Requires careful design and tuning of base/meta models, and complexity can increase training time and risk of overfitting
.png)