Util

helpers

unfair_data_generator.util.helpers.get_class_name(cls)

Converts binary classification labels (0, 1) to descriptive class name.

Parameters:

cls (int) – Numerical class identifier representing a class (0 or 1).

Returns:

Descriptive name for the class. Returns empty string for invalid inputs.

0 maps to “Negative class”

1 maps to “Positive class”

Return type:

str

unfair_data_generator.util.helpers.get_group_marker(group)

Map numerical group identifier to matplotlib marker shape.

Provides distinct visual markers for plotting different groups in scatter plots and other visualizations.

Parameters:: group (int) – Numerical identifier for the group.
Returns:: Matplotlib marker string. One of: ‘o’, ‘s’, ‘^’, ‘D’, ‘*’, ‘v’, ‘<’, ‘>’, ‘p’, ‘h’, ‘H’, ‘+’, ‘x’, ‘8’.
Return type:: str

unfair_data_generator.util.helpers.get_group_name(unique_groups, group)

Map numerical group identifier to descriptive weather-based group name.

This function provides human-readable names for sensitive attribute groups based on the total number of groups in the dataset.

Parameters:

unique_groups (ndarray) – Array containing all unique group identifiers in the dataset. The length determines which naming convention to use. Example: [0, 1] or [0, 1, 2, 3].
group (int) – Numerical identifier representing a specific group.

Returns:

Descriptive group name.

Return type:

str

Note

Supported group configurations with weather names:

2 groups: Sunny, Cloudy

3 groups: Sunny, Cloudy, Rainy

4 groups: Sunny, Cloudy, Rainy, Windy

5 groups: Sunny, Cloudy, Rainy, Windy, Stormy

unfair_data_generator.util.helpers.get_params_for_certain_equality_type(equality_type, sensitive_group_count)

Generate group-specific parameters for different fairness equality types.

This function provides pre-configured parameter sets that simulate different fairness scenarios in machine learning. Each equality type addresses specific fairness concerns by adjusting dataset generation parameters.

Parameters:

equality_type (str) –
The fairness criterion to simulate. Must be one of:
- ”Equal quality”
  Ensures the classifier performs equally well for all sensitive groups by adjusting class_sep.
- ”Demographic parity”
  Ensures equal proportions of positive and negative samples across groups by adjusting weights.
- ”Equal opportunity”
  Ensures equal True Positive Rates (TPR) for all sensitive groups by adjusting weights and class_sep.
- ”Equalized odds”
  Ensures both TPR and False Positive Rates (FPR) are equal across groups by fine-tuning class_sep.
sensitive_group_count (int) – Number of sensitive groups in the dataset (2, 3, 4, or 5). Determines which groups will be included and their parameters.

Returns:

Dictionary containing parameters for each group. Parameters may include:

’class_sep’float
Controls class separability (affects classification difficulty). Higher values = easier classification for that group.

’weights’list of float
Controls class distribution [negative_weight, positive_weight]. Affects the proportion of positive vs negative samples.

Note

weights influence the proportion of positive and negative class samples.
class_sep determines the separability of clusters, affecting accuracy and other metrics.

Examples

>>> params = get_params_for_certain_equality_type("Equal quality", 2)
>>> print(params)
{'Sunny': {'class_sep': 1}, 'Cloudy': {'class_sep': 0.6}}

>>> params = get_params_for_certain_equality_type("Demographic parity", 3)
>>> print(params)
{'Sunny': {'weights': [0.7, 0.3]}, 'Cloudy': {'weights': [0.2, 0.8]},
'Rainy': {'weights': [0.4, 0.6]}}

Return type:

dict

model_trainer

unfair_data_generator.util.model_trainer.evaluate_fairness_by_group(y_true, y_pred, groups, sensitive_groups)

Evaluate fairness metrics for each sensitive group.

This function calculates metrics such as accuracy, True Positive Rate (TPR), False Positive Rate (FPR), and confusion matrices for each sensitive group in the dataset.

Parameters:

y_true (ndarray) – Ground truth (true) target values.
y_pred (ndarray) – Predicted labels from the classifier.
groups (ndarray) – Sensitive group assignments for each sample. Each element indicates which sensitive group the corresponding sample belongs to.
sensitive_groups (list) – List of unique sensitive group identifiers present in the dataset. For example, [“Sunny”, “Cloudy”, “Rainy”].

Returns:

Dictionary containing fairness metrics for each sensitive group. Each key is formatted as “Group {group}” and maps to a dictionary containing:

’Confusion Matrix’list of lists
2x2 confusion matrix as nested lists in format [[TN, FP], [FN, TP]].

’Accuracy’float
Classification accuracy \(\frac{TP + TN}{TP + TN + FP + FN}\).

’True Positive Rate (TPR)’float
Sensitivity or recall, calculated as \(\frac{TP}{TP + FN}\). Returns 0 if no positive samples exist in the group.

’False Positive Rate (FPR)’float
Calculated as \(\frac{FP}{FP + TN}\). Returns 0 if no negative samples exist in the group.

Return type:

dict

unfair_data_generator.util.model_trainer.train_and_evaluate_model_with_classifier(X, y, Z)

Train a Random Forest classifier and evaluate performance and fairness across sensitive groups.

This function separates sensitive features from the dataset, trains a Random Forest classifier, and calculates fairness metrics for each sensitive group.

Parameters:

X (ndarray) – Training data feature matrix. Contains the input features used for classification, excluding sensitive attributes.
y (ndarray) – Target values for classification. Binary labels where 0 represents the negative class and 1 represents the positive class.
Z (ndarray) – Sensitive group information for each sample. Each element indicates which sensitive/protected group the corresponding sample belongs to.

Returns:

Comprehensive fairness metrics organized by sensitive group. Each key represents a group name (determined by get_group_name function) and maps to a dictionary containing:

’Accuracy’float
Classification accuracy for the group.

’True Positive Rate (TPR)’float
Sensitivity/recall for the group, calculated as \(\frac{TP}{TP+FN}\).

’False Positive Rate (FPR)’float
Calculated as \(\frac{FP}{FP+TN}\) for the group.

’Samples in Positive class’int
Number of samples predicted as positive class (TP + FP).

’Samples in Negative class’int
Number of samples predicted as negative class (TN + FN).

’Confusion Matrix’dict
Detailed breakdown with keys ‘True Negative (TN)’, ‘False Positive (FP)’, ‘False Negative (FN)’, ‘True Positive (TP)’.

Return type:

dict

visualizer

unfair_data_generator.util.visualizer.visualize_TPR_FPR_metrics(metrics, title)

Create bar charts to visualize True Positive Rate (TPR) and False Positive Rate (FPR) for different groups.

Parameters:

metrics (dict) – Dictionary containing fairness metrics for each group. Each group should have ‘True Positive Rate (TPR)’ and ‘False Positive Rate (FPR)’ keys.
title (str) – Title for the plot.

Returns:

The matplotlib pyplot object containing the group cluster visualization.

Return type:

matplotlib.pyplot

unfair_data_generator.util.visualizer.visualize_accuracy(metrics, title)

Create bar charts to visualize accuracy for different groups.

Parameters:

metrics (dict) – Dictionary containing accuracy metrics for each group.
title (str) – Title for the plot.

Returns:

The matplotlib pyplot object containing the group cluster visualization.

Return type:

matplotlib.pyplot

unfair_data_generator.util.visualizer.visualize_group_classes(X, y, Z, centroids, feature1=None, feature2=None, feature1_name=None, feature2_name=None, title='Group cluster visualization')

Visualize data points for group-class combinations and centroids.

Parameters:

X (ndarray) – Feature matrix. When using default features, the first two columns are used.
y (ndarray) – Target class labels.
centroids (dict) – Dictionary of group-specific centroids. Dictionary of group-specific centroids.
Z (ndarray) – Sensitive group labels for each sample.
feature1 (ndarray, optional) – First feature for visualization. If None, uses X[:, 0].
feature2 (ndarray, optional) – Second feature for visualization. If None, uses X[:, 1].
feature1_name (str, optional) – Name for the x-axis label. Defaults to “Feature 1” or “Custom Feature 1”.
feature2_name (str, optional) – Name for the y-axis label. Defaults to “Feature 2” or “Custom Feature 2”.
title (str, default="Group cluster visualization") – Title for the plot.

Returns:

The matplotlib pyplot object containing the group cluster visualization.

Return type:

matplotlib.pyplot

Raises:

ValueError – If feature1 and feature2 are provided but are the same, or if they are not provided correctly.
ValueError – If feature1 and feature2 are not provided together, or both should be None for default behavior.

unfair_data_generator.util.visualizer.visualize_groups_separately(X, y, Z, feature1=None, feature2=None, feature1_name=None, feature2_name=None, title='Group-specific visualization')

Generate scatter plots for individual groups, showing data points for each class.

Parameters:

X (ndarray) – Feature matrix. When using default features, the first two columns are used.
y (ndarray) – Target class labels.
Z (ndarray) – Sensitive group labels for each sample.
feature1 (ndarray, optional) – First feature for visualization. If None, uses X[:, 0].
feature2 (ndarray, optional) – Second feature for visualization. If None, uses X[:, 1].
feature1_name (str, optional) – Name for the first feature axis label. Defaults to “Feature 1”.
feature2_name (str, optional) – Name for the second feature axis label. Defaults to “Feature 2”.
title (str, default="Group-specific visualization") – Base title for the plots.

Returns:

Dictionary containing matplotlib figure objects for each group.

Return type:

dict

Raises:

ValueError – If feature1 and feature2 are provided but are the same, or if they are not provided correctly.
ValueError – If feature1 and feature2 are not provided together, or both should be None for default behavior.