Title: | Machine Learning Model Evaluation for 'h2o' Package |
---|---|
Description: | Enhances the H2O platform by providing tools for detailed evaluation of machine learning models. It includes functions for bootstrapped performance evaluation, extended F-score calculations, and various other metrics, aimed at improving model assessment. |
Authors: | E. F. Haghish [aut, cre, cph] |
Maintainer: | E. F. Haghish <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.4 |
Built: | 2025-03-17 21:18:44 UTC |
Source: | https://github.com/haghish/h2otools |
Extracts models' parameters from AutoML grid
automlModelParam(model)
automlModelParam(model)
model |
a h2o AutoML object |
a dataframe of models' parameters
E. F. Haghish
## Not run: if(requireNamespace("h2o")) { library(h2o) h2o.init(ignore_config = TRUE, nthreads = 2, bind_to_localhost = FALSE, insecure = TRUE) prostate_path <- system.file("extdata", "prostate.csv", package = "h2o") prostate <- h2o.importFile(path = prostate_path, header = TRUE) y <- "CAPSULE" prostate[,y] <- as.factor(prostate[,y]) #convert to factor for classification aml <- h2o.automl(y = y, training_frame = prostate, include_algos = "GLM", max_models = 1, max_runtime_secs = 60) # extract the model parameters model.param <- automlModelParam(aml@leader) } ## End(Not run)
## Not run: if(requireNamespace("h2o")) { library(h2o) h2o.init(ignore_config = TRUE, nthreads = 2, bind_to_localhost = FALSE, insecure = TRUE) prostate_path <- system.file("extdata", "prostate.csv", package = "h2o") prostate <- h2o.importFile(path = prostate_path, header = TRUE) y <- "CAPSULE" prostate[,y] <- as.factor(prostate[,y]) #convert to factor for classification aml <- h2o.automl(y = y, training_frame = prostate, include_algos = "GLM", max_models = 1, max_runtime_secs = 60) # extract the model parameters model.param <- automlModelParam(aml@leader) } ## End(Not run)
Evaluates variable importance as well as bootstrapped variable importance for a single model or a grid of models
bootImportance(model, df, metric, n = 100)
bootImportance(model, df, metric, n = 100)
model |
a model or a model grid of models trained by h2o machine learning software |
df |
dataset for testing the model. if "n" is bigger than 1, this dataset will be used for drawing bootstrap samples. otherwise (default), the entire dataset will be used for evaluating the model |
metric |
character. model evaluation metric to be passed to boot R package. this could be, for example "AUC", "AUCPR", RMSE", etc., depending of the model you have trained. all evaluation metrics provided for your H2O models can be specified here. |
n |
number of bootstraps |
list of mean perforance of the specified metric and other bootstrap results
E. F. Haghish
## Not run: library(h2o) h2o.init(ignore_config = TRUE, nthreads = 2, bind_to_localhost = FALSE, insecure = TRUE) prostate_path <- system.file("extdata", "prostate.csv", package = "h2o") df <- read.csv(prostate_path) # prepare the dataset for analysis before converting it to h2o frame. df$CAPSULE <- as.factor(df$CAPSULE) # convert the dataframe to H2OFrame and run the analysis prostate.hex <- as.h2o(df) aml <- h2o.automl(y = "CAPSULE", training_frame = prostate.hex, max_runtime_secs = 30) # evaluate the model performance perf <- h2o.performance(aml@leader, xval = TRUE) # evaluate bootstrap performance for the training dataset # NOTE that the raw data is given not the 'H2OFrame' perf <- bootPerformance(model = aml@leader, df = df, metric = "RMSE", n = 500) ## End(Not run)
## Not run: library(h2o) h2o.init(ignore_config = TRUE, nthreads = 2, bind_to_localhost = FALSE, insecure = TRUE) prostate_path <- system.file("extdata", "prostate.csv", package = "h2o") df <- read.csv(prostate_path) # prepare the dataset for analysis before converting it to h2o frame. df$CAPSULE <- as.factor(df$CAPSULE) # convert the dataframe to H2OFrame and run the analysis prostate.hex <- as.h2o(df) aml <- h2o.automl(y = "CAPSULE", training_frame = prostate.hex, max_runtime_secs = 30) # evaluate the model performance perf <- h2o.performance(aml@leader, xval = TRUE) # evaluate bootstrap performance for the training dataset # NOTE that the raw data is given not the 'H2OFrame' perf <- bootPerformance(model = aml@leader, df = df, metric = "RMSE", n = 500) ## End(Not run)
Evaluate model performance by bootstrapping from training dataset
bootPerformance(model, df, metric, n = 100)
bootPerformance(model, df, metric, n = 100)
model |
a model trained by h2o machine learning software |
df |
training, validation, or testing dataset to bootstrap from |
metric |
character. model evaluation metric to be passed to boot R package. this could be, for example "AUC", "AUCPR", RMSE", etc., depending of the model you have trained. all evaluation metrics provided for your H2O models can be specified here. |
n |
number of bootstraps |
list of mean perforance of the specified metric and other bootstrap results
E. F. Haghish
## Not run: library(h2o) h2o.init(ignore_config = TRUE, nthreads = 2, bind_to_localhost = FALSE, insecure = TRUE) prostate_path <- system.file("extdata", "prostate.csv", package = "h2o") df <- read.csv(prostate_path) # prepare the dataset for analysis before converting it to h2o frame. df$CAPSULE <- as.factor(df$CAPSULE) # convert the dataframe to H2OFrame and run the analysis prostate.hex <- as.h2o(df) aml <- h2o.automl(y = "CAPSULE", training_frame = prostate.hex, max_runtime_secs = 30) # evaluate the model performance perf <- h2o.performance(aml@leader, xval = TRUE) # evaluate bootstrap performance for the training dataset # NOTE that the raw data is given not the 'H2OFrame' perf <- bootPerformance(model = aml@leader, df = df, metric = "RMSE", n = 500) ## End(Not run)
## Not run: library(h2o) h2o.init(ignore_config = TRUE, nthreads = 2, bind_to_localhost = FALSE, insecure = TRUE) prostate_path <- system.file("extdata", "prostate.csv", package = "h2o") df <- read.csv(prostate_path) # prepare the dataset for analysis before converting it to h2o frame. df$CAPSULE <- as.factor(df$CAPSULE) # convert the dataframe to H2OFrame and run the analysis prostate.hex <- as.h2o(df) aml <- h2o.automl(y = "CAPSULE", training_frame = prostate.hex, max_runtime_secs = 30) # evaluate the model performance perf <- h2o.performance(aml@leader, xval = TRUE) # evaluate bootstrap performance for the training dataset # NOTE that the raw data is given not the 'H2OFrame' perf <- bootPerformance(model = aml@leader, df = df, metric = "RMSE", n = 500) ## End(Not run)
This function evaluates an R expression while capturing its printed output, messages, warnings, and errors. It returns a list containing the result of the evaluation along with all the captured texts.
capture(expr)
capture(expr)
expr |
An R expression to evaluate. The expression is captured using
|
The function uses withCallingHandlers()
and tryCatch()
to capture
side effects of evaluating the expression. Printed output is captured using
capture.output()
. Warnings and messages are intercepted and their default
display is suppressed using invokeRestart("muffleWarning")
and
invokeRestart("muffleMessage")
, respectively. If an error occurs, its
message is stored and NULL
is returned as the value.
A list with the following components:
The result of evaluating expr
.
A character vector with the printed output produced during evaluation.
A character vector with any messages generated during evaluation.
A character vector with any warnings produced during evaluation.
A character string with the error message if an error occurred; otherwise NULL
.
## Not run: # Example: Capturing output, messages, warnings, and errors captured <- capture({ print("Hello, world!") message("This is a message.") warning("This is a warning.") 42 # Final value returned }) # Display the captured components print(captured$output) # Printed output print(captured$messages) # Messages print(captured$warnings) # Warnings print(captured$error) # Error message (if any) print(captured$value) # The evaluated result (42 in this example) ## End(Not run)
## Not run: # Example: Capturing output, messages, warnings, and errors captured <- capture({ print("Hello, world!") message("This is a message.") warning("This is a warning.") 42 # Final value returned }) # Display the captured components print(captured$output) # Printed output print(captured$messages) # Messages print(captured$warnings) # Warnings print(captured$error) # Error message (if any) print(captured$value) # The evaluated result (42 in this example) ## End(Not run)
checks the class of the input data.frame, makes sure that the specified 'df' is indeed a data.frame and more over, there is no column with class 'character' or 'ordered' in the data.frame. this function helps you ensure that your data is compatible with h2o R package.
checkFrame(df, ignore = NULL, is.df = TRUE, no.char = TRUE, no.ordered = TRUE)
checkFrame(df, ignore = NULL, is.df = TRUE, no.char = TRUE, no.ordered = TRUE)
df |
data.frame object to evaluate |
ignore |
character vector of column names that should be ignored, if any. |
is.df |
logical. if TRUE, it examines if the 'df' is 'data.frame' |
no.char |
logical. if TRUE, it examines if the 'df' has any columns of class 'character' |
no.ordered |
logical. if TRUE, it examines if the 'df' has any columns of class 'ordered' factors |
nothing
E. F. Haghish
data(cars) # no error is expected because 'cars' dataset does not # have 'ordered' or 'character' columns checkFrame(cars)
data(cars) # no error is expected because 'cars' dataset does not # have 'ordered' or 'character' columns checkFrame(cars)
Calculates F-Measure for any given value of Beta
Fmeasure(perf, beta = 1, max = FALSE)
Fmeasure(perf, beta = 1, max = FALSE)
perf |
a h2o object of class |
beta |
numeric, specifying beta value, which must be higher than zero |
max |
logical. default is FALSE. if TRUE, instead of providing the F-Measure for all the thresholds, the highest F-Measure is reported. |
a matrix of F-Measures for different thresholds or the highest F-Measure value
E. F. Haghish
## Not run: library(h2o) h2o.init(ignore_config = TRUE, nthreads = 2, bind_to_localhost = FALSE, insecure = TRUE) prostate_path <- system.file("extdata", "prostate.csv", package = "h2o") prostate <- h2o.importFile(path = prostate_path, header = TRUE) y <- "CAPSULE" prostate[,y] <- as.factor(prostate[,y]) #convert to factor for classification aml <- h2o.automl(y = y, training_frame = prostate, max_runtime_secs = 30) # evaluate the model performance perf <- h2o.performance(aml@leader, xval = TRUE) # evaluate F-Measure for a Beta = 3 Fmeasure(perf, beta = 3, max = TRUE) # evaluate F-Measure for a Beta = 1.5 Fmeasure(perf, beta = 1.5, max = TRUE) # evaluate F-Measure for a Beta = 4 Fmeasure(perf, beta = 4, max = TRUE) ## End(Not run)
## Not run: library(h2o) h2o.init(ignore_config = TRUE, nthreads = 2, bind_to_localhost = FALSE, insecure = TRUE) prostate_path <- system.file("extdata", "prostate.csv", package = "h2o") prostate <- h2o.importFile(path = prostate_path, header = TRUE) y <- "CAPSULE" prostate[,y] <- as.factor(prostate[,y]) #convert to factor for classification aml <- h2o.automl(y = y, training_frame = prostate, max_runtime_secs = 30) # evaluate the model performance perf <- h2o.performance(aml@leader, xval = TRUE) # evaluate F-Measure for a Beta = 3 Fmeasure(perf, beta = 3, max = TRUE) # evaluate F-Measure for a Beta = 1.5 Fmeasure(perf, beta = 1.5, max = TRUE) # evaluate F-Measure for a Beta = 4 Fmeasure(perf, beta = 4, max = TRUE) ## End(Not run)
retrieve performance matrix for all thresholds
getPerfMatrix(perf)
getPerfMatrix(perf)
perf |
a h2o object of class |
a matrix of F-Measures for different thresholds or the highest F-Measure value
E. F. Haghish
## Not run: library(h2o) h2o.init(ignore_config = TRUE, nthreads = 2, bind_to_localhost = FALSE, insecure = TRUE) prostate_path <- system.file("extdata", "prostate.csv", package = "h2o") prostate <- h2o.importFile(path = prostate_path, header = TRUE) y <- "CAPSULE" prostate[,y] <- as.factor(prostate[,y]) #convert to factor for classification aml <- h2o.automl(y = y, training_frame = prostate, max_runtime_secs = 30) # evaluate the model performance perf <- h2o.performance(aml@leader, xval = TRUE) # get the performance matrix for all thresholds getPerfMatrix(perf) ## End(Not run)
## Not run: library(h2o) h2o.init(ignore_config = TRUE, nthreads = 2, bind_to_localhost = FALSE, insecure = TRUE) prostate_path <- system.file("extdata", "prostate.csv", package = "h2o") prostate <- h2o.importFile(path = prostate_path, header = TRUE) y <- "CAPSULE" prostate[,y] <- as.factor(prostate[,y]) #convert to factor for classification aml <- h2o.automl(y = y, training_frame = prostate, max_runtime_secs = 30) # evaluate the model performance perf <- h2o.performance(aml@leader, xval = TRUE) # get the performance matrix for all thresholds getPerfMatrix(perf) ## End(Not run)
extracts the model IDs from H2O AutoML object or H2O grid
h2o.get_ids(automl)
h2o.get_ids(automl)
automl |
a h2o |
a character vector of trained models' names (IDs)
E. F. Haghish
## Not run: library(h2o) h2o.init(ignore_config = TRUE, nthreads = 2, bind_to_localhost = FALSE, insecure = TRUE) prostate_path <- system.file("extdata", "prostate.csv", package = "h2o") prostate <- h2o.importFile(path = prostate_path, header = TRUE) y <- "CAPSULE" prostate[,y] <- as.factor(prostate[,y]) #convert to factor for classification aml <- h2o.automl(y = y, training_frame = prostate, max_runtime_secs = 30) # get the model IDs ids <- h2o.ids(aml) ## End(Not run)
## Not run: library(h2o) h2o.init(ignore_config = TRUE, nthreads = 2, bind_to_localhost = FALSE, insecure = TRUE) prostate_path <- system.file("extdata", "prostate.csv", package = "h2o") prostate <- h2o.importFile(path = prostate_path, header = TRUE) y <- "CAPSULE" prostate[,y] <- as.factor(prostate[,y]) #convert to factor for classification aml <- h2o.automl(y = y, training_frame = prostate, max_runtime_secs = 30) # get the model IDs ids <- h2o.ids(aml) ## End(Not run)
Calculates kappa for all thresholds
kappa(perf, max = FALSE)
kappa(perf, max = FALSE)
perf |
a h2o object of class |
max |
logical. default is FALSE. if TRUE, instead of providing the F-Measure for all the thresholds, the highest F-Measure is reported. |
a matrix of F-Measures for different thresholds or the highest F-Measure value
E. F. Haghish
## Not run: library(h2o) h2o.init(ignore_config = TRUE, nthreads = 2, bind_to_localhost = FALSE, insecure = TRUE) prostate_path <- system.file("extdata", "prostate.csv", package = "h2o") prostate <- h2o.importFile(path = prostate_path, header = TRUE) y <- "CAPSULE" prostate[,y] <- as.factor(prostate[,y]) #convert to factor for classification aml <- h2o.automl(y = y, training_frame = prostate, max_runtime_secs = 30) # evaluate the model performance perf <- h2o.performance(aml@leader, xval = TRUE) # evaluate F-Measure for a Beta = 3 kappa(perf, max = TRUE) ## End(Not run)
## Not run: library(h2o) h2o.init(ignore_config = TRUE, nthreads = 2, bind_to_localhost = FALSE, insecure = TRUE) prostate_path <- system.file("extdata", "prostate.csv", package = "h2o") prostate <- h2o.importFile(path = prostate_path, header = TRUE) y <- "CAPSULE" prostate[,y] <- as.factor(prostate[,y]) #convert to factor for classification aml <- h2o.automl(y = y, training_frame = prostate, max_runtime_secs = 30) # evaluate the model performance perf <- h2o.performance(aml@leader, xval = TRUE) # evaluate F-Measure for a Beta = 3 kappa(perf, max = TRUE) ## End(Not run)
takes h2o performance object of class "H2OBinomialMetrics" alongside caret confusion matrix and provides different model performance measures supported by h2o and caret
performance(perf)
performance(perf)
perf |
h2o performance object of class "H2OBinomialMetrics" |
numeric vector
E. F. Haghish
## Not run: library(h2o) h2o.init(ignore_config = TRUE, nthreads = 2, bind_to_localhost = FALSE, insecure = TRUE) prostate_path <- system.file("extdata", "prostate.csv", package = "h2o") prostate <- h2o.importFile(path = prostate_path, header = TRUE) y <- "CAPSULE" prostate[,y] <- as.factor(prostate[,y]) #convert to factor for classification aml <- h2o.automl(y = y, training_frame = prostate, max_runtime_secs = 30) # evaluate the model performance perf <- h2o.performance(aml@leader, xval = TRUE) # compute more performance measures performance(perf) ## End(Not run)
## Not run: library(h2o) h2o.init(ignore_config = TRUE, nthreads = 2, bind_to_localhost = FALSE, insecure = TRUE) prostate_path <- system.file("extdata", "prostate.csv", package = "h2o") prostate <- h2o.importFile(path = prostate_path, header = TRUE) y <- "CAPSULE" prostate[,y] <- as.factor(prostate[,y]) #convert to factor for classification aml <- h2o.automl(y = y, training_frame = prostate, max_runtime_secs = 30) # evaluate the model performance perf <- h2o.performance(aml@leader, xval = TRUE) # compute more performance measures performance(perf) ## End(Not run)