oracle.tester module

Module for testing hierarchical models in the ORACLE framework.

class oracle.tester.Tester

Bases: object

Top-level class providing testing functionalities for hierarchical classification models.

create_classification_report(y_true, y_pred, file_name=None)

Generates a classification report comparing true and predicted labels, and optionally writes the report to a CSV file. This method first filters the input arrays to include only entries with a non-None true label. It then computes the classification report using scikit-learn’s classification_report function. If a file name is provided, it also exports the detailed report as a CSV file.

Parameters:

y_true (array-like) – Array of true labels. Only entries where the label is not None will be considered.
y_pred (array-like) – Array of predicted labels, corresponding to y_true.
file_name (str, optional) – The file path where the CSV report will be saved. If None, the CSV file is not generated.

Returns:

A text summary of the classification report.

Return type:

str

create_loss_history_plot()

Create and save a plot of the training and validation loss history.

Note

The numpy files must exist in the specified directory.
The ‘plot_train_val_history’ function must be properly defined and accessible.

create_metric_phase_plots()

Generates phase plots for key evaluation metrics across all experimental phases. This method iterates over a predefined list of metrics (‘f1-score’, ‘precision’, ‘recall’), retrieving the corresponding metric values across different phases by invoking the get_metric_over_all_phases method. For each metric, it then generates two types of plots:

Class-wise performance over all phases using plot_class_wise_performance_over_all_phases.

Level-averaged performance over all phases using plot_average_performance_over_all_phases.

The plots are saved to the directory specified by the model_dir attribute.

get_metric_over_all_phases(metric)

Calculates and aggregates the specified metric (f1-score, precision, or recall) across all non-root taxonomy depths.

Parameters:: metric (str) – The name of the metric to process. Must be one of [‘f1-score’, ‘precision’, ‘recall’].
Returns:: A dictionary where each key is a taxonomy depth (int) and each value is a pandas DataFrame containing the day-wise aggregated metric data.
Return type:: dict
Raises:: AssertionError – If the provided metric is not one of the accepted values.

make_embeddings_for_AD(test_loader, d)

Generate latent space embeddings for anomaly detection (AD) analysis and save the results. This method processes the test dataset by running model inference, extracting latent embeddings, and gathering the corresponding class labels and identifiers. It then creates a UMAP plot of the embeddings and saves both the plot and a CSV file containing the combined embedding data.

Parameters:

test_loader (iterable) – A DataLoader or iterable over the test dataset where each batch is a dictionary with keys ‘label’, ‘raw_label’, ‘id’, and any tensor data needed for embedding generation.
d (int or float) – A parameter indicating the number of days (or a similar metric) used in naming the output files and plots.

merge_performance_tables(days)

Merge performance tables for specified days and print LaTeX formatted results.

Parameters:: days (iterable) – A collection (e.g., list) of identifiers representing different report days.
Returns:: None

Side Effects:: Outputs a LaTeX formatted table to standard output.

run_all_analysis(test_loader, d)

Run analysis on the test set and generate evaluation plots and reports. This method sets the model to evaluation mode and iterates over the test_loader to perform inference. It aggregates the predicted class probabilities and the corresponding true labels, translating them into a hierarchical format based on the taxonomy provided. For each depth level (excluding the root level), it computes:

The recovery of true labels for the corresponding hierarchy level.

Confusion matrices for recall and precision, saving the plots as PDF files.

ROC curves for the predicted probabilities, saving the plots as PDF files.

A classification report that is both printed on the console and saved as a CSV file.

Parameters:

test_loader (iterable) – An iterable (e.g., DataLoader) that yields batches of test data, where each batch is a dictionary containing tensors (and other values) including the key ‘label’.
d (int) – An integer representing the number of days used in the trigger, incorporated into the naming of output files.

Returns:

None

setup_testing(model_dir, device)

Sets up the testing environment by configuring the model directory and device used for testing.

Parameters:

model_dir (str) – The directory path where the model files are stored.
device (torch.device or str) – The device on which the model will run (e.g., CPU or GPU).

Returns:

None