Skip to content

ML Metrics/Loss

Extension for ML Metrics/Losses

Machine Learning / Time series Loss and Evaluation Metrics

Functions:

Name Description
query_adj_r2

Returns the adjusted coefficient of determineation for a regression model.

query_binary_metrics

Computes the following binary classificaition metrics using self as actual and pred as predictions:

query_cat_cross_entropy

Returns the categorical cross entropy. If you want to avoid numerical error due to log, please

query_confusion_matrix

Computes the binary confusion matrix given the true labels (actual) and

query_dcg_score

Calculates the Discounted Cumulative Gain score.

query_hubor_loss

Computes huber loss between this and the other expression. This assumes

query_l1

Returns L1 loss, aka. mean absolute error.

query_l2

Returns squared L2 loss, aka. mean squared error.

query_l_inf

Returns L Inf loss.

query_log_cosh

Computes log cosh of the the prediction error, which is a smooth variation of MAE (L1 loss).

query_log_loss

Computes log loss, aka binary cross entropy loss, between self and other pred expression.

query_mad

Computes the Mean/median Absolute Deviation.

query_mape

Computes mean absolute percentage error between self and the other pred expression.

query_mase

Computes the Mean/Median Absolute Scaled Error. This is the time series version in the reference article.

query_mcc

Returns the Matthews correlation coefficient (phi coefficient). The inputs must be 0s and 1s

query_msle

Computes the mean square log error between this and the other pred expression.

query_multi_roc_auc

Computes multiclass ROC AUC. Self (actuals) must be labels represented by integer values

query_ndcg_score

Compute Normalized Discounted Cumulative Gain.

query_r2

Returns the coefficient of determineation for a regression model.

query_roc_auc

Computes ROC AUC using self as actual and pred as predictions.

query_smape

Computes symmetric mean absolute percentage error between self and other pred expression.

query_tpr_fpr

Returns the TPR and FPR for all thresholds. This is useful when you want to study the thresholds

query_adj_r2(actual, pred, p)

Returns the adjusted coefficient of determineation for a regression model.

Parameters:

Name Type Description Default
actual str | Expr

An expression represeting the actual

required
pred str | Expr

A Polars expression representing predictions

required
p int

The number of explanatory variables

required
Source code in python/polars_ds/exprs/metrics.py
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
def query_adj_r2(actual: str | pl.Expr, pred: str | pl.Expr, p: int) -> pl.Expr:
    """
    Returns the adjusted coefficient of determineation for a regression model.

    Parameters
    ----------
    actual
        An expression represeting the actual
    pred
        A Polars expression representing predictions
    p
        The number of explanatory variables
    """
    actual_expr = to_expr(actual)
    pred_expr = to_expr(pred)
    diff = actual_expr - pred_expr
    ss_res = diff.dot(diff)
    diff2 = actual_expr - actual_expr.mean()
    ss_tot = diff2.dot(diff2)
    df_res = actual_expr.len() - p
    df_tot = actual_expr.len() - 1
    return 1.0 - (ss_res / df_res) / (ss_tot / df_tot)

query_binary_metrics(actual, pred, threshold=0.5)

Computes the following binary classificaition metrics using self as actual and pred as predictions: precision, recall, f, average_precision and roc_auc. The return will be a struct with values having the names as given here.

Self must be binary and castable to type UInt32. If self is not all 0s and 1s, the result will not make sense, or some error may occur. If there is no positive class in data, NaN or other numerical error may occur.

Average precision is computed using Sum (R_n - R_n-1)*P_n-1, which is not the textbook definition, but is consistent with Scikit-learn. For more information, see https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html

Parameters:

Name Type Description Default
actual str | Expr

An expression represeting the actual

required
pred str | Expr

An expression represeting the column with predicted probability.

required
threshold float

The threshold used to compute precision, recall and f (f score).

0.5
Source code in python/polars_ds/exprs/metrics.py
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
def query_binary_metrics(
    actual: str | pl.Expr, pred: str | pl.Expr, threshold: float = 0.5
) -> pl.Expr:
    """
    Computes the following binary classificaition metrics using self as actual and pred as predictions:
    precision, recall, f, average_precision and roc_auc. The return will be a struct with values
    having the names as given here.

    Self must be binary and castable to type UInt32. If self is not all 0s and 1s,
    the result will not make sense, or some error may occur. If there is no positive class in data,
    NaN or other numerical error may occur.

    Average precision is computed using Sum (R_n - R_n-1)*P_n-1, which is not the textbook definition,
    but is consistent with Scikit-learn. For more information, see
    https://scikit-learn.org/stable/modules/generated/sklearn.metrics.average_precision_score.html

    Parameters
    ----------
    actual
        An expression represeting the actual
    pred
        An expression represeting the column with predicted probability.
    threshold
        The threshold used to compute precision, recall and f (f score).
    """
    return pl_plugin(
        symbol="pl_combo_b",
        args=[
            to_expr(actual).cast(pl.UInt32),
            to_expr(pred),
            pl.lit(threshold, dtype=pl.Float64),
        ],
        returns_scalar=True,
    )

query_cat_cross_entropy(actual, pred, normalize=True, dense=True)

Returns the categorical cross entropy. If you want to avoid numerical error due to log, please set pred = pred + epsilon.

Parameters:

Name Type Description Default
actual str | Expr

An expression represeting the actual

required
pred str | Expr

An expression represeting the predicted probabilities for the classes. Must of be List/arr[f64] type.

required
normalize bool

Whether to divide by N.

True
dense bool

If true, actual has to be a dense vector (a single number for each row, starting from 0). If false, it has to be a column of lists/arrs with only one 1 and 0s otherwise.

True
Source code in python/polars_ds/exprs/metrics.py
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
def query_cat_cross_entropy(
    actual: str | pl.Expr, pred: str | pl.Expr, normalize: bool = True, dense: bool = True
) -> pl.Expr:
    """
    Returns the categorical cross entropy. If you want to avoid numerical error due to log, please
    set pred = pred + epsilon.

    Parameters
    ----------
    actual
        An expression represeting the actual
    pred
        An expression represeting the predicted probabilities for the classes. Must of be List/arr[f64] type.
    normalize
        Whether to divide by N.
    dense
        If true, actual has to be a dense vector (a single number for each row, starting from 0). If false, it has
        to be a column of lists/arrs with only one 1 and 0s otherwise.
    """
    a = to_expr(actual)
    p = to_expr(pred)
    if dense:
        y_prob = p.list.get(a)
    else:
        y_prob = p.list.get(a.list.arg_max())
    if normalize:
        return -y_prob.log().sum() / a.count()
    return -y_prob.log().sum()

query_confusion_matrix(actual, pred, threshold=0.5, all_metrics=False)

Computes the binary confusion matrix given the true labels (actual) and the predicted labels (computed from pred, a column of predicted scores and threshold). When a divide by zero is encountered, NaN is returned.

Parameters:

Name Type Description Default
actual str | Expr

An expression representing the actual labels. Must be castable to boolean

required
pred str | Expr

An expression representing the column with predicted probability

required
threshold float

The threshold used to compute the predicted labels, by default 0.5

0.5
all_metrics bool

If True, compute all 25 possible confusion matrix statistics instead of just True Positive, False Positive, True Negative, False Negative, by default False

False

Returns:

Type Description
Expr

A struct of confusion matrix metrics

Examples:

Limited to just the basic confusion matrix

>>> df = pl.DataFrame({"actual": [1, 0, 1], "pred": [0.4, 0.6, 0.9]})
>>> df.select(pds.query_confusion_matrix("actual", "pred").alias("metrics")).unnest("metrics")
shape: (1, 4)
┌─────┬─────┬─────┬─────┐
│ tn  ┆ fp  ┆ fn  ┆ tp  │
│ --- ┆ --- ┆ --- ┆ --- │
│ u32 ┆ u32 ┆ u32 ┆ u32 │
╞═════╪═════╪═════╪═════╡
│ 0   ┆ 1   ┆ 1   ┆ 1   │
└─────┴─────┴─────┴─────┘

With all_metrics set to True

>>> df.select(
...     pds.query_confusion_matrix("actual", "pred", all_metrics=True).alias("metrics")
... ).unnest("metrics")
shape: (1, 25)
┌─────┬─────┬─────┬─────┬───┬────────────┬─────┬─────┬─────┐
│ tn  ┆ fp  ┆ fn  ┆ tp  ┆ … ┆ markedness ┆ fdr ┆ npv ┆ dor │
│ --- ┆ --- ┆ --- ┆ --- ┆   ┆ ---        ┆ --- ┆ --- ┆ --- │
│ u32 ┆ u32 ┆ u32 ┆ u32 ┆   ┆ f64        ┆ f64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════╪═════╪═══╪════════════╪═════╪═════╪═════╡
│ 0   ┆ 1   ┆ 1   ┆ 1   ┆ … ┆ -0.5       ┆ 0.5 ┆ 0.0 ┆ NaN │
└─────┴─────┴─────┴─────┴───┴────────────┴─────┴─────┴─────┘
Source code in python/polars_ds/exprs/metrics.py
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
def query_confusion_matrix(
    actual: str | pl.Expr,
    pred: str | pl.Expr,
    threshold: float = 0.5,
    all_metrics: bool = False,
) -> pl.Expr:
    """
    Computes the binary confusion matrix given the true labels (`actual`) and
    the predicted labels (computed from `pred`, a column of predicted scores and
    `threshold`). When a divide by zero is encountered, NaN is returned.

    Parameters
    ----------
    actual : str | pl.Expr
        An expression representing the actual labels. Must be castable to boolean
    pred : str | pl.Expr
        An expression representing the column with predicted probability
    threshold : float, optional
        The threshold used to compute the predicted labels, by default 0.5
    all_metrics : bool, optional
        If True, compute all 25 possible confusion matrix statistics instead of
        just True Positive, False Positive, True Negative, False Negative,
        by default False

    Returns
    -------
    pl.Expr
        A struct of confusion matrix metrics

    Examples
    --------
    Limited to just the basic confusion matrix

    >>> df = pl.DataFrame({"actual": [1, 0, 1], "pred": [0.4, 0.6, 0.9]})
    >>> df.select(pds.query_confusion_matrix("actual", "pred").alias("metrics")).unnest("metrics")
    shape: (1, 4)
    ┌─────┬─────┬─────┬─────┐
    │ tn  ┆ fp  ┆ fn  ┆ tp  │
    │ --- ┆ --- ┆ --- ┆ --- │
    │ u32 ┆ u32 ┆ u32 ┆ u32 │
    ╞═════╪═════╪═════╪═════╡
    │ 0   ┆ 1   ┆ 1   ┆ 1   │
    └─────┴─────┴─────┴─────┘

    With `all_metrics` set to True

    >>> df.select(
    ...     pds.query_confusion_matrix("actual", "pred", all_metrics=True).alias("metrics")
    ... ).unnest("metrics")
    shape: (1, 25)
    ┌─────┬─────┬─────┬─────┬───┬────────────┬─────┬─────┬─────┐
    │ tn  ┆ fp  ┆ fn  ┆ tp  ┆ … ┆ markedness ┆ fdr ┆ npv ┆ dor │
    │ --- ┆ --- ┆ --- ┆ --- ┆   ┆ ---        ┆ --- ┆ --- ┆ --- │
    │ u32 ┆ u32 ┆ u32 ┆ u32 ┆   ┆ f64        ┆ f64 ┆ f64 ┆ f64 │
    ╞═════╪═════╪═════╪═════╪═══╪════════════╪═════╪═════╪═════╡
    │ 0   ┆ 1   ┆ 1   ┆ 1   ┆ … ┆ -0.5       ┆ 0.5 ┆ 0.0 ┆ NaN │
    └─────┴─────┴─────┴─────┴───┴────────────┴─────┴─────┴─────┘
    """
    # Cast to bool first to check the label is in correct format. Then back to u32.
    act = to_expr(actual).cast(pl.Boolean).cast(pl.UInt32)
    p = to_expr(pred).gt(threshold).cast(pl.UInt32)
    res = pl_plugin(
        symbol="pl_binary_confusion_matrix",
        args=[(2 * act) + p],  # See Rust code for bincount trick
        returns_scalar=True,
    )
    if all_metrics:
        return res
    else:
        return pl.struct(
            res.struct.field("tn"),
            res.struct.field("fp"),
            res.struct.field("fn"),
            res.struct.field("tp"),
        )

query_dcg_score(y_true, y_score, log_base=2.0, k=None, ignore_ties=False)

Calculates the Discounted Cumulative Gain score.

Parameters:

Name Type Description Default
y_true str | Expr

This is often called the relevance column. The relevance score represents the true relevance/importance of each item.

required
y_score str | Expr

The name/expr of the column containing the predicted scores used for ranking the items.

required
log_base float

The log base used in the discount factor

2.0
k int | None

The number of top items to consider in the NDCG calculation. If None, all items are considered. Defaults to None.

None
ignore_ties bool

If True, handles tied scores by averaging their contributions. If False, ranks items with the same score sequentially. Defaults to False.

False
Source code in python/polars_ds/exprs/metrics.py
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
def query_dcg_score(
    y_true: str | pl.Expr,
    y_score: str | pl.Expr,
    log_base: float = 2.0,
    k: int | None = None,
    ignore_ties: bool = False,
) -> pl.Expr:
    """
    Calculates the Discounted Cumulative Gain score.

    Parameters
    ----------
    y_true:
        This is often called the `relevance` column. The relevance score represents the
        true relevance/importance of each item.
    y_score:
        The name/expr of the column containing the predicted scores used for ranking the items.
    log_base:
        The log base used in the discount factor
    k:
        The number of top items to consider in the NDCG calculation. If None, all items are
        considered. Defaults to None.
    ignore_ties:
        If True, handles tied scores by averaging their contributions. If False, ranks items with
        the same score sequentially. Defaults to False.
    """

    yt = to_expr(y_true)
    ys = to_expr(y_score)

    range_ = pl.int_range(1, pl.len() + 1)
    discount = math.log(log_base) / range_.log1p()
    if k is not None:
        discount = discount * (range_ <= k).cast(pl.Float64)

    if ignore_ties:
        ranking = ys.arg_sort().reverse()
        return (yt.gather(ranking)).dot(discount)
    else:
        discount_cumsum = discount.cum_sum()
        # inv, and counts are equivalent to
        # _, inv, counts = np.unique(-y_score, return_inverse=True, return_counts=True)
        counts = ys.sort(descending=True).unique_counts()
        inv = (ys.rank(method="dense", descending=True) - 1).set_sorted(descending=True)
        # The add_at function is an equivalent to numpy.add.at
        n_unique = counts.len()
        ranked = add_at(inv, yt, n_unique) / counts
        groups = counts.cum_sum() - 1

        # PL VERSION
        major, minor, patch = list(map(lambda v: int(v), pl.__version__.split(".")))
        if (major, minor, patch) < (1, 32, 3):
            discount_sums = (
                discount_cumsum.gather(groups)
                .diff()
                .fill_null(discount_cumsum.get(groups.first()).first())
            )  # The only possible null is at position 0

        else:  # in newer versions, get() returns a value which auto resolves to a scaler
            discount_sums = (
                discount_cumsum.gather(groups).diff().fill_null(discount_cumsum.get(groups.first()))
            )  # The only possible null is at position 0

        return ranked.dot(discount_sums)

query_gini(actual, pred)

Computes the Gini coefficient. This is 2 * AUC - 1.

Self must be binary and castable to type UInt32. If self is not all 0s and 1s or not binary, the result will not make sense, or some error may occur. If no positive class exist in data, NaN will be returned.

Parameters:

Name Type Description Default
actual str | Expr

An expression represeting the actual

required
pred str | Expr

An expression represeting the column with predicted probability.

required
Source code in python/polars_ds/exprs/metrics.py
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
def query_gini(actual: str | pl.Expr, pred: str | pl.Expr) -> pl.Expr:
    """
    Computes the Gini coefficient. This is 2 * AUC - 1.

    Self must be binary and castable to type UInt32. If self is not all 0s and 1s or not binary,
    the result will not make sense, or some error may occur. If no positive class exist in data,
    NaN will be returned.

    Parameters
    ----------
    actual
        An expression represeting the actual
    pred
        An expression represeting the column with predicted probability.
    """
    return query_roc_auc(actual, pred) * 2.0 - 1.0

query_hubor_loss(actual, pred, delta)

Computes huber loss between this and the other expression. This assumes this expression is actual, and the input is predicted, although the order does not matter in this case.

Parameters:

Name Type Description Default
actual str | Expr

An expression representing the column with actual target.

required
pred str | Expr

An expression representing the column with predicted values.

required
delta float

The threshold at which the loss changes from quadratic to linear.

required
Source code in python/polars_ds/exprs/metrics.py
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
def query_hubor_loss(actual: str | pl.Expr, pred: str | pl.Expr, delta: float) -> pl.Expr:
    """
    Computes huber loss between this and the other expression. This assumes
    this expression is actual, and the input is predicted, although the order
    does not matter in this case.

    Parameters
    ----------
    actual
        An expression representing the column with actual target.
    pred
        An expression representing the column with predicted values.
    delta
        The threshold at which the loss changes from quadratic to linear.
    """
    a, p = to_expr(actual), to_expr(pred)
    temp = (a - p).abs()
    return (
        pl.when(temp <= delta).then(0.5 * temp.pow(2)).otherwise(delta * (temp - 0.5 * delta)).sum()
        / a.count()
    )

query_l1(actual, pred, normalize=True)

Returns L1 loss, aka. mean absolute error.

Parameters:

Name Type Description Default
actual str | Expr

An expression represeting the actual

required
pred str | Expr

A Polars expression representing predictions

required
normalize bool

Whether to divide by N. Nulls won't be counted in N.

True
Source code in python/polars_ds/exprs/metrics.py
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
def query_l1(actual: str | pl.Expr, pred: str | pl.Expr, normalize: bool = True) -> pl.Expr:
    """
    Returns L1 loss, aka. mean absolute error.

    Parameters
    ----------
    actual
        An expression represeting the actual
    pred
        A Polars expression representing predictions
    normalize
        Whether to divide by N. Nulls won't be counted in N.
    """
    a = to_expr(actual)
    p = to_expr(pred)
    if normalize:
        return (a - p).abs().sum() / a.count()
    return (a - p).abs().sum()

query_l2(actual, pred, normalize=True)

Returns squared L2 loss, aka. mean squared error.

Parameters:

Name Type Description Default
actual str | Expr

An expression represeting the actual

required
pred str | Expr

A Polars expression representing predictions

required
normalize bool

Whether to divide by N.

True
Source code in python/polars_ds/exprs/metrics.py
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
def query_l2(actual: str | pl.Expr, pred: str | pl.Expr, normalize: bool = True) -> pl.Expr:
    """
    Returns squared L2 loss, aka. mean squared error.

    Parameters
    ----------
    actual
        An expression represeting the actual
    pred
        A Polars expression representing predictions
    normalize
        Whether to divide by N.
    """
    a = to_expr(actual)
    p = to_expr(pred)
    diff = a - p
    if normalize:
        return diff.dot(diff) / a.count()
    return diff.dot(diff)

query_l_inf(actual, pred)

Returns L Inf loss.

Parameters:

Name Type Description Default
actual str | Expr

An expression represeting the actual

required
pred str | Expr

A Polars expression representing predictions

required
Source code in python/polars_ds/exprs/metrics.py
182
183
184
185
186
187
188
189
190
191
192
193
194
195
def query_l_inf(actual: str | pl.Expr, pred: str | pl.Expr) -> pl.Expr:
    """
    Returns L Inf loss.

    Parameters
    ----------
    actual
        An expression represeting the actual
    pred
        A Polars expression representing predictions
    """
    a = to_expr(actual)
    p = to_expr(pred)
    return (a - p).abs().max()

query_log_cosh(actual, pred, normalize=True)

Computes log cosh of the the prediction error, which is a smooth variation of MAE (L1 loss).

Source code in python/polars_ds/exprs/metrics.py
108
109
110
111
112
113
114
115
def query_log_cosh(actual: str | pl.Expr, pred: str | pl.Expr, normalize: bool = True) -> pl.Expr:
    """
    Computes log cosh of the the prediction error, which is a smooth variation of MAE (L1 loss).
    """
    a, p = to_expr(actual), to_expr(pred)
    if normalize:
        return (p - a).cosh().log().sum() / a.count()
    return (p - a).cosh().log().sum()

query_log_loss(actual, pred, normalize=True)

Computes log loss, aka binary cross entropy loss, between self and other pred expression.

Parameters:

Name Type Description Default
actual str | Expr

An expression represeting the actual

required
pred str | Expr

An expression represeting the column with predicted probability.

required
normalize bool

Whether to divide by N.

True
Source code in python/polars_ds/exprs/metrics.py
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
def query_log_loss(actual: str | pl.Expr, pred: str | pl.Expr, normalize: bool = True) -> pl.Expr:
    """
    Computes log loss, aka binary cross entropy loss, between self and other `pred` expression.

    Parameters
    ----------
    actual
        An expression represeting the actual
    pred
        An expression represeting the column with predicted probability.
    normalize
        Whether to divide by N.
    """
    a = to_expr(actual).cast(pl.Float64)
    p = to_expr(pred).cast(pl.Float64)
    first = pl_plugin(
        args=[a, p],
        symbol="pl_xlogy",
        is_elementwise=True,
    )
    second = pl_plugin(
        args=[pl.lit(1.0, dtype=pl.Float64) - a, pl.lit(1.0, dtype=pl.Float64) - p],
        symbol="pl_xlogy",
        is_elementwise=True,
    )

    if normalize:
        return -(first + second).mean()
    return -(first + second).sum()

query_mad(x, use_mean=True)

Computes the Mean/median Absolute Deviation.

Parameters:

Name Type Description Default
x str | Expr

An expression represeting the actual

required
use_mean bool

If true, computes mean absolute deviation. If false, use median instead of mean.

True
Source code in python/polars_ds/exprs/metrics.py
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
def query_mad(x: str | pl.Expr, use_mean: bool = True) -> pl.Expr:
    """
    Computes the Mean/median Absolute Deviation.

    Parameters
    ----------
    x
        An expression represeting the actual
    use_mean
        If true, computes mean absolute deviation. If false, use median instead of mean.
    """
    xx = to_expr(x)
    if use_mean:
        return (xx - xx.mean()).abs().mean()
    else:
        return (xx - xx.median()).abs().median()

query_mape(actual, pred, weighted=False)

Computes mean absolute percentage error between self and the other pred expression. If weighted, it will compute the weighted version as defined here:

https://en.wikipedia.org/wiki/Mean_absolute_percentage_error

Parameters:

Name Type Description Default
actual str | Expr

An expression represeting the actual

required
pred str | Expr

An expression represeting the column with predicted probability.

required
weighted bool

If true, computes wMAPE in the wikipedia article

False
Source code in python/polars_ds/exprs/metrics.py
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
def query_mape(actual: str | pl.Expr, pred: str | pl.Expr, weighted: bool = False) -> pl.Expr:
    """
    Computes mean absolute percentage error between self and the other `pred` expression.
    If weighted, it will compute the weighted version as defined here:

    https://en.wikipedia.org/wiki/Mean_absolute_percentage_error

    Parameters
    ----------
    actual
        An expression represeting the actual
    pred
        An expression represeting the column with predicted probability.
    weighted
        If true, computes wMAPE in the wikipedia article
    """
    a = to_expr(actual)
    p = to_expr(pred)
    if weighted:
        return (a - p).abs().sum() / a.abs().sum()
    else:
        return (1 - p / a).abs().mean()

query_mase(actual, pred, train, freq=1, use_mean=True)

Computes the Mean/Median Absolute Scaled Error. This is the time series version in the reference article.

Note: typically, train = pl.col('y').filter(pl.col('time') < T), and pred = pl.col('y_pred').filter(pl.col('time') >= T) and actual = pl.col('y').filter(pl.col('time') >= T)

Parameters:

Name Type Description Default
actual str | Expr

An expression represeting the actual

required
pred str | Expr

A Polars expression representing predictions

required
train str | Expr | float

A polars exression representing training data values. If train is a float, it is treated as the precomputed naive one-step forecast loss on training as in the definition.

required
freq int

Defaults to 1 which applies to non-seasonal data, and you may set it to m (>0) which indicates the length of the season. How frequent does the period repeat itself? Every freq records.

1
use_mean bool

If true, this will compute Mean Absolute Scaled Error. If false, this uses median instead of mean.

True
Reference

https://en.wikipedia.org/wiki/Mean_absolute_scaled_error

Source code in python/polars_ds/exprs/metrics.py
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
def query_mase(
    actual: str | pl.Expr,
    pred: str | pl.Expr,
    train: str | pl.Expr | float,
    freq: int = 1,
    use_mean: bool = True,
) -> pl.Expr:
    """
    Computes the Mean/Median Absolute Scaled Error. This is the time series version in the reference article.

    Note: typically, train = pl.col('y').filter(pl.col('time') < T), and
    pred = pl.col('y_pred').filter(pl.col('time') >= T) and actual = pl.col('y').filter(pl.col('time') >= T)

    Parameters
    ----------
    actual
        An expression represeting the actual
    pred
        A Polars expression representing predictions
    train
        A polars exression representing training data values. If train is a float, it is treated
        as the precomputed naive one-step forecast loss on training as in the definition.
    freq
        Defaults to 1 which applies to non-seasonal data, and you may set it to m (>0)
        which indicates the length of the season. How frequent does the period repeat itself? Every `freq`
        records.
    use_mean
        If true, this will compute Mean Absolute Scaled Error. If false, this uses median instead of mean.

    Reference
    ---------
    https://en.wikipedia.org/wiki/Mean_absolute_scaled_error
    """
    if freq < 1:
        raise ValueError("Input `freq` must be >= 1.")

    a: pl.Expr = to_expr(actual)
    p: pl.Expr = to_expr(pred)

    if isinstance(train, float):
        if use_mean:
            numerator = (a - p).abs().mean()
        else:
            numerator = (a - p).abs().median()

        return numerator / pl.lit(train)

    else:
        train_expr = to_expr(train)
        if use_mean:
            numerator = (a - p).abs().mean()
            denom = train_expr.diff(n=freq).abs().mean()
        else:
            numerator = (a - p).abs().median()
            denom = train_expr.diff(n=freq).abs().median()

        return numerator / denom

query_mcc(y_true, y_pred)

Returns the Matthews correlation coefficient (phi coefficient). The inputs must be 0s and 1s and castable to u32. If not, the result may not be correct. See query_confusion_matrix for querying all the confusion metrics at the same time.

Parameters:

Name Type Description Default
y_true str | Expr

The true labels. Must be 0s and 1s.

required
y_pred str | Expr

The predicted labels. Must be 0s and 1s. E.g. This could be say (y_prob > 0.5).cast(pl.UInt32)

required
Reference

https://en.wikipedia.org/wiki/Phi_coefficient

Source code in python/polars_ds/exprs/metrics.py
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
def query_mcc(y_true: str | pl.Expr, y_pred: str | pl.Expr) -> pl.Expr:
    """
    Returns the Matthews correlation coefficient (phi coefficient). The inputs must be 0s and 1s
    and castable to u32. If not, the result may not be correct. See query_confusion_matrix for querying
    all the confusion metrics at the same time.

    Parameters
    ----------
    y_true
        The true labels. Must be 0s and 1s.
    y_pred
        The predicted labels. Must be 0s and 1s. E.g. This could be say (y_prob > 0.5).cast(pl.UInt32)

    Reference
    ---------
    https://en.wikipedia.org/wiki/Phi_coefficient
    """

    y = to_expr(y_true)
    x = to_expr(y_pred)
    combined = (2 * y + x).cast(pl.UInt32)

    return pl_plugin(
        symbol="pl_mcc",
        args=[combined],
        returns_scalar=True,
    )

query_msle(actual, pred, normalize=True)

Computes the mean square log error between this and the other pred expression.

Parameters:

Name Type Description Default
pred str | Expr

An expression represeting the column with predicted probability.

required
normalize bool

If true, divide the result by length of the series

True
Source code in python/polars_ds/exprs/metrics.py
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
def query_msle(actual: str | pl.Expr, pred: str | pl.Expr, normalize: bool = True) -> pl.Expr:
    """
    Computes the mean square log error between this and the other `pred` expression.

    Parameters
    ----------
    pred
        An expression represeting the column with predicted probability.
    normalize
        If true, divide the result by length of the series
    """
    a = to_expr(actual)
    p = to_expr(pred)
    diff = a.log1p() - p.log1p()
    out = diff.dot(diff)
    if normalize:
        return out / a.count()
    return out

query_multi_roc_auc(actual, pred, n_classes, strategy='weighted')

Computes multiclass ROC AUC. Self (actuals) must be labels represented by integer values ranging in the range [0, n_classes), and pred must be a column of list[f64] with size n_classes.

Parameters:

Name Type Description Default
actual str | Expr

An expression represeting the actual

required
pred str | Expr

The multilabel prediction column

required
n_classes int

The number of classes

required
strategy MultiAUCStrategy

Either macro or weighted, which are defined the same as in Scikit-learn.

'weighted'
Source code in python/polars_ds/exprs/metrics.py
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
def query_multi_roc_auc(
    actual: str | pl.Expr,
    pred: str | pl.Expr,
    n_classes: int,
    strategy: MultiAUCStrategy = "weighted",
) -> pl.Expr:
    """
    Computes multiclass ROC AUC. Self (actuals) must be labels represented by integer values
    ranging in the range [0, n_classes), and pred must be a column of list[f64] with size `n_classes`.

    Parameters
    ----------
    actual
        An expression represeting the actual
    pred
        The multilabel prediction column
    n_classes
        The number of classes
    strategy
        Either `macro` or `weighted`, which are defined the same as in Scikit-learn.
    """
    a = to_expr(actual)
    p = to_expr(pred)
    if strategy == "macro":
        actuals = [a == i for i in range(n_classes)]
        preds = [p.list.get(i) for i in range(n_classes)]
        return pl.sum_horizontal(query_roc_auc(a, p) for a, p in zip(actuals, preds)) / n_classes
    elif strategy == "weighted":
        actuals = [a == i for i in range(n_classes)]
        preds = [p.list.get(i) for i in range(n_classes)]
        return (
            pl.sum_horizontal(a.sum() * query_roc_auc(a, p) for a, p in zip(actuals, preds))
            / pl.len()
        )
    else:
        raise NotImplementedError

query_ndcg_score(y_true, y_score, k=None, ignore_ties=False)

Compute Normalized Discounted Cumulative Gain.

NDCG is a measure of ranking quality that considers both the relevance of items and their positions in the ranked list. It normalizes the DCG score by the ideal DCG score (the maximum possible DCG for a given set of relevance scores).

Note: (1) ndcg_score should not be used on negative y_true values. (2) This function might run faster in lazy Polars.

Parameters:

Name Type Description Default
y_true str | Expr

This is often called the relevance column. The relevance score represents the true relevance/importance of each item.

required
y_score str | Expr

The name/expr of the column containing the predicted scores used for ranking the items.

required
k int | None

The number of top items to consider in the NDCG calculation. If None, all items are considered. Defaults to None.

None
ignore_ties bool

If True, handles tied scores by averaging their contributions. If False, ranks items with the same score sequentially. Defaults to False.

False

Examples:

Basic usage with a simple ranking task:

>>> df = pl.DataFrame(
...     {
...         "query_id": [1, 1, 1, 2, 2],
...         "relevance": [3, 2, 1, 2, 1],
...         "score": [0.9, 0.8, 0.7, 0.6, 0.5],
...     }
... )
>>> ndcg = (
...     df.group_by("query_id")
...     .agg(ndsg_score=pds.query_ndcg_score("relevance", "score", k=2, ignore_ties=False))
...     .select(pl.col("ndsg_score").mean())
... )
>>> print(ndcg)  # Shows the mean NDCG@2 across all queries

Handling tied scores:

>>> df = pl.DataFrame(
...     {
...         "query_id": [1, 1, 1, 1],
...         "relevance": [3, 2, 2, 1],
...         "score": [0.9, 0.8, 0.8, 0.7],  # Note the tied scores
...     }
... )
>>> # (no need to group by because it is a single query)
>>> ndcg_no_tie_handling = df.select(ndcg=pds.query_ndcg_score("relevance", "score", k=3))
>>> ndcg_tie_handling = df.select(
...     ndcg=pds.query_ndcg_score("relevance", "score", k=3, ignore_ties=True)
... )

Using with recommendation systems:

>>> recommendations = pl.DataFrame(
...     {
...         "user_id": [1, 1, 1, 2, 2, 2],
...         "item_id": ["i1", "i2", "i3", "i4", "i5", "i6"],
...         "relevance": [5, 4, 3, 4, 3, 2],  # User ratings or engagement metrics
...         "pred_score": [0.95, 0.85, 0.75, 0.9, 0.8, 0.7],  # Model predictions
...     }
... )
>>> ndcg = recommendations.group_by("user_id").agg(
...     ndcg=pds.query_ndcg_score("relevance", "pred_score", k=10)
... )
Source code in python/polars_ds/exprs/metrics.py
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
def query_ndcg_score(
    y_true: str | pl.Expr,
    y_score: str | pl.Expr,
    k: int | None = None,
    ignore_ties: bool = False,
) -> pl.Expr:
    """Compute Normalized Discounted Cumulative Gain.

    NDCG is a measure of ranking quality that considers both the relevance of items
    and their positions in the ranked list. It normalizes the DCG score by the ideal
    DCG score (the maximum possible DCG for a given set of relevance scores).

    Note: (1) ndcg_score should not be used on negative y_true values. (2) This function
    might run faster in lazy Polars.

    Parameters
    ----------
    y_true:
        This is often called the `relevance` column. The relevance score represents the
        true relevance/importance of each item.
    y_score:
        The name/expr of the column containing the predicted scores used for ranking the items.
    k:
        The number of top items to consider in the NDCG calculation. If None, all items are
        considered. Defaults to None.
    ignore_ties:
        If True, handles tied scores by averaging their contributions. If False, ranks items with
        the same score sequentially. Defaults to False.

    Examples
    --------
    Basic usage with a simple ranking task:
    >>> df = pl.DataFrame(
    ...     {
    ...         "query_id": [1, 1, 1, 2, 2],
    ...         "relevance": [3, 2, 1, 2, 1],
    ...         "score": [0.9, 0.8, 0.7, 0.6, 0.5],
    ...     }
    ... )
    >>> ndcg = (
    ...     df.group_by("query_id")
    ...     .agg(ndsg_score=pds.query_ndcg_score("relevance", "score", k=2, ignore_ties=False))
    ...     .select(pl.col("ndsg_score").mean())
    ... )
    >>> print(ndcg)  # Shows the mean NDCG@2 across all queries

    Handling tied scores:
    >>> df = pl.DataFrame(
    ...     {
    ...         "query_id": [1, 1, 1, 1],
    ...         "relevance": [3, 2, 2, 1],
    ...         "score": [0.9, 0.8, 0.8, 0.7],  # Note the tied scores
    ...     }
    ... )
    >>> # (no need to group by because it is a single query)
    >>> ndcg_no_tie_handling = df.select(ndcg=pds.query_ndcg_score("relevance", "score", k=3))
    >>> ndcg_tie_handling = df.select(
    ...     ndcg=pds.query_ndcg_score("relevance", "score", k=3, ignore_ties=True)
    ... )

    Using with recommendation systems:
    >>> recommendations = pl.DataFrame(
    ...     {
    ...         "user_id": [1, 1, 1, 2, 2, 2],
    ...         "item_id": ["i1", "i2", "i3", "i4", "i5", "i6"],
    ...         "relevance": [5, 4, 3, 4, 3, 2],  # User ratings or engagement metrics
    ...         "pred_score": [0.95, 0.85, 0.75, 0.9, 0.8, 0.7],  # Model predictions
    ...     }
    ... )
    >>> ndcg = recommendations.group_by("user_id").agg(
    ...     ndcg=pds.query_ndcg_score("relevance", "pred_score", k=10)
    ... )
    """
    gain = query_dcg_score(y_true, y_score, k=k, ignore_ties=ignore_ties)
    # Lazy Polars should be able to figure out things like
    # `discount` and `yt.sort(descending=True)` are common in the calculation of gain
    # and normalized_gain, and so should be only computed once.
    yt = to_expr(y_true)
    range_ = pl.int_range(1, pl.len() + 1)
    discount = math.log(2.0) / range_.log1p()
    if k is not None:
        discount = discount * (range_ <= k).cast(pl.Float64)
    # this is an optimization we can do because we know yt = ys. No need to compute the
    # ranking variable in the ignore_ties case
    normalizing_gain = (yt.sort(descending=True)).dot(discount)
    # since gain and normalizing_gain are a scalar, NaN means normalizing gain is 0
    return (gain / normalizing_gain).fill_nan(pl.lit(0.0, dtype=pl.Float64))

query_r2(actual, pred)

Returns the coefficient of determineation for a regression model.

Parameters:

Name Type Description Default
actual str | Expr

An expression represeting the actual

required
pred str | Expr

A Polars expression representing predictions

required
Source code in python/polars_ds/exprs/metrics.py
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
def query_r2(actual: str | pl.Expr, pred: str | pl.Expr) -> pl.Expr:
    """
    Returns the coefficient of determineation for a regression model.

    Parameters
    ----------
    actual
        An expression represeting the actual
    pred
        A Polars expression representing predictions
    """
    a = to_expr(actual)
    p = to_expr(pred)
    diff = a - p
    ss_res = diff.dot(diff)
    diff2 = a - a.mean()
    ss_tot = diff2.dot(diff2)
    return 1.0 - ss_res / ss_tot

query_roc_auc(actual, pred)

Computes ROC AUC using self as actual and pred as predictions.

Self must be binary and castable to type UInt32. If self is not all 0s and 1s or not binary, the result will not make sense, or some error may occur. If no positive class exist in data, NaN will be returned.

Parameters:

Name Type Description Default
actual str | Expr

An expression represeting the actual. Must be castable to UInt32.

required
pred str | Expr

An expression represeting the column with predicted probability.

required
Source code in python/polars_ds/exprs/metrics.py
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
def query_roc_auc(
    actual: str | pl.Expr,
    pred: str | pl.Expr,
) -> pl.Expr:
    """
    Computes ROC AUC using self as actual and pred as predictions.

    Self must be binary and castable to type UInt32. If self is not all 0s and 1s or not binary,
    the result will not make sense, or some error may occur. If no positive class exist in data,
    NaN will be returned.

    Parameters
    ----------
    actual
        An expression represeting the actual. Must be castable to UInt32.
    pred
        An expression represeting the column with predicted probability.
    """
    return pl_plugin(
        symbol="pl_roc_auc",
        args=[to_expr(actual).cast(pl.UInt32), to_expr(pred).cast(pl.Float64)],
        returns_scalar=True,
    )

query_smape(actual, pred)

Computes symmetric mean absolute percentage error between self and other pred expression. The value is always between 0 and 1. This is the third version in the wikipedia without the 100 factor.

https://en.wikipedia.org/wiki/Symmetric_mean_absolute_percentage_error

Parameters:

Name Type Description Default
actual str | Expr

An expression represeting the actual

required
pred str | Expr

A Polars expression representing predictions

required
Source code in python/polars_ds/exprs/metrics.py
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
def query_smape(actual: str | pl.Expr, pred: str | pl.Expr) -> pl.Expr:
    """
    Computes symmetric mean absolute percentage error between self and other `pred` expression.
    The value is always between 0 and 1. This is the third version in the wikipedia without
    the 100 factor.

    https://en.wikipedia.org/wiki/Symmetric_mean_absolute_percentage_error

    Parameters
    ----------
    actual
        An expression represeting the actual
    pred
        A Polars expression representing predictions
    """
    a = to_expr(actual)
    p = to_expr(pred)
    numerator = (a - p).abs()
    denominator = a.abs() + p.abs()
    return (numerator / denominator).sum() / a.count()

query_tpr_fpr(actual, pred)

Returns the TPR and FPR for all thresholds. This is useful when you want to study the thresholds or when you want to plot roc auc curve.

Parameters:

Name Type Description Default
actual str | Expr

An expression represeting the actual. Must be castable to UInt32.

required
pred str | Expr

An expression represeting the column with predicted probability.

required
Source code in python/polars_ds/exprs/metrics.py
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
def query_tpr_fpr(
    actual: str | pl.Expr,
    pred: str | pl.Expr,
) -> pl.Expr:
    """
    Returns the TPR and FPR for all thresholds. This is useful when you want to study the thresholds
    or when you want to plot roc auc curve.

    Parameters
    ----------
    actual
        An expression represeting the actual. Must be castable to UInt32.
    pred
        An expression represeting the column with predicted probability.
    """
    return pl_plugin(
        symbol="pl_tpr_fpr",
        args=[to_expr(actual).cast(pl.UInt32), to_expr(pred).cast(pl.Float64)],
    )