BoostingDecisionTrees
Documentation for BoostingDecisionTrees.
BoostingDecisionTrees.BoostingDecisionTreesBoostingDecisionTrees.AdaBoostBoostingDecisionTrees.DecisionNodeBoostingDecisionTrees.LeafNodeBoostingDecisionTrees.TreeNodeBoostingDecisionTrees.best_splitBoostingDecisionTrees.createWeightedDatasetBoostingDecisionTrees.gini_impurityBoostingDecisionTrees.information_gainBoostingDecisionTrees.load_data_irisBoostingDecisionTrees.predictBoostingDecisionTrees.predictBoostingDecisionTrees.predictBoostingDecisionTrees.train_adaboostBoostingDecisionTrees.train_tree
BoostingDecisionTrees.BoostingDecisionTrees — Module
BoostingDecisionTrees
A Julia module for decision tree and boosting algorithms, including decision stumps, Gini impurity, and information gain utilities.
Overview
This module provides tools for training and evaluating simple decision trees, with support for both Gini impurity and information gain as splitting criteria.
Features
- Splitting Criteria: Supports both Gini impurity and information gain for feature selection.
- Utilities: Includes helper functions for entropy, Gini impurity, and majority voting.
Exports
- Decision TreeNode Functions:
- train_tree: Train a decision tree on a dataset.
- predict: Make predictions using a trained decision tree.
- AdaBoost Functions
- train_adaboost: Train an AdaBoost model on a dataset.
- predict: Make predicitions using a trained AdaBoost model
BoostingDecisionTrees.AdaBoost — Type
AdaBoost(learner, alphas)
A stronger ensemble learning classifier consisting of multiple weaker learners.
Each new learner focuses on correcting the errors made by its predicessors.
# Fields
- `learner::Vector{DecisionTree}`: A collection of DecisionTree objects. Each tree acts as a weak classifier that makes a prediction based on a single feature threshold.
- `alphas::Vector{Float64}`: A vector of floating-point weights, corresponding to the voting power of a tree. A higher alpha values means the stump was more accurate during the training phase.BoostingDecisionTrees.DecisionNode — Type
DecisionNodeA decision node in a decision tree that splits data based on a feature and threshold.
Fields
feature::Int: the feature index used for splitting.threshold::Float64: the threshold value for the split.left::TreeNode: the left subtree (samples where feature ≤ threshold).right::TreeNode: the right subtree (samples where feature > threshold).
BoostingDecisionTrees.LeafNode — Type
LeafNodeA leaf node holding a predicted class label.
BoostingDecisionTrees.TreeNode — Type
TreeNodeAbstract type for decision tree nodes.
BoostingDecisionTrees.best_split — Method
best_split(feature, labels)Find the best threshold to split a feature vector for minimizing Gini impurity.
Arguments
feature::AbstractVector{<:Real}: A vector of numerical feature values.labels::AbstractVector: a vector of class labels (same length asfeature)
Returns
best_threshold::Union{Float64, Nothing}: The best numerical value to split the feature on.
Returns nothing if no split is possible.
best_gini::Float64: The weighted Gini impurity after the split.
Examples
julia> feature = [1.0, 2.0, 3.0, 4.0];
julia> labels = ["A", "A", "B", "B"];
julia> best_threshold, best_gini = best_split(feature, labels)
(2.5, 0.0)BoostingDecisionTrees.createWeightedDataset — Method
createWeightedDataset(X, y, weights)Create a new dataset by sampling rows from X and y, guided by a probability distribution defined by weights where samples with higher weights are more likely to be selected for the new dataset
Arguments
X::AbstractMatrix: rows are samples, columns are features.y::AbstractVector: class labels for each sample.weights::Vector{Float64}: weight of each sample in the given dataset. The sum of all weights should be 1.
Returns
X_prime: A resampled matrix of the same dimensions and type asX.y_prime: A resampled vector of the same length and type asy.
BoostingDecisionTrees.gini_impurity — Method
gini_impurity(classes)Compute the Gini impurity of a vector of class labels.
Arguments
classes::AbstractVector: A collection of class labels.
Returns
Float64: The Gini impurity of the input vector. Returns0if the input is empty.
BoostingDecisionTrees.information_gain — Method
information_gain(X_column::AbstractVector{<:Real}, y::AbstractVector)Compute the best information gain obtainable by splitting a numeric feature column using a threshold (x ≤ t vs. x > t).
Returns
best_threshold::Float64: threshold yielding maximum information gainbest_gain::Float64: corresponding information gain
BoostingDecisionTrees.load_data_iris — Method
load_data_iris(path)Load the Iris dataset from a CSV file, shuffle the observations, and split features from labels.
AI Disclaimer
This helper method was generated by AI
Arguments
path::String: The file path to the CSV file (e.g., "src/data/Iris.csv").
Returns
X::Matrix: A matrix of feature values (columns 2 through 5).y::Vector: A vector of target labels.
BoostingDecisionTrees.predict — Method
predict(model, X)Predict class labels for samples in X using a trained AdaBoost classifier.
The function adds the weighted votes of all decision trees within the model to determine the most likely class for each sample.
Arguments
model::AdaBoost: A trained AdaBoost structure.X::AbstractMatrix: rows are samples, columns are features.
Returns
Vector: A vector of predicted labels, with the same type as the labels found in the model's learners.
BoostingDecisionTrees.predict — Method
predict(model::AdaBoost, X::AbstractVector)A convenience method for predicting the label of a single sample.
Arguments
model::AdaBoost: A trained AdaBoost structure.X::AbstractVector: A single sample represented as a vector of features.
Returns
- The predicted label for the single input sample.
BoostingDecisionTrees.predict — Method
predict(tree::TreeNode, X::AbstractMatrix)Make predictions for multiple samples using the decision tree.
Arguments
tree::TreeNode: a trained decision tree.X::AbstractMatrix: rows are samples, columns are features.
Returns
Vector{Any}: predicted class labels for each sample inX.
Examples
julia> tree = DecisionNode{String}(1, 2.5, LeafNode("a"), LeafNode("b"), String);
julia> X = [1.0 2.0; 3.0 0.5; 2.0 1.5];
julia> preds = predict(tree, X)
3-element Vector{String}:
"a"
"b"
"a"BoostingDecisionTrees.train_adaboost — Method
train_adaboost(X, y; iterations, max_alpha)Trains an AdaBoost classifier on the given dataset.
Arguments
X::AbstractMatrix: rows are samples, columns are features.y::AbstractVector: class labels for each sample.iterations::Integer: maximum number of weak learners. In case of perfect fit the training will be stopped early. Values must be in range [1, Inf). Default is 50.max_alpha::Float64: A threshold to cap the "amount of say" (alpha) for any single stump. Default is 2.5 which means an accuracy of >= 99.999%. The bigger the value the more of a 'dictator' becomes a stump with a perfect result.max_depth::Integer: Maximum depth of each tree. Default is set to 1 which is equivalent to a decision stump
Returns
AdaBoost: a trained classifier withlearnersandalphas.
BoostingDecisionTrees.train_tree — Method
train_tree(X, y; max_depth=5, criterion=:gini)Train a decision tree using numeric threshold splits.
Arguments
X::AbstractMatrix: feature matrix.y::AbstractVector: class labels.max_depth::Int: maximum tree depth.criterion::Symbol::information_gainor:gini.
Returns
TreeNode