Skip to content

Itertools Expr

Iteration Helper Expressions

Iteration related helper expressions

Functions:

Name Description
combinations

Get all k-combinations of non-null values in source. This is an expensive operation, as

product

Get the cartesian product of two series. Only non-nulls values will be used.

combinations(source, k, unique=False)

Get all k-combinations of non-null values in source. This is an expensive operation, as n choose k can grow very fast.

Parameters:

Name Type Description Default
source str | Expr

Input source column, must have numeric or string type

required
k int

The k in N choose k

required
unique bool

Whether to run .unique() on the source column

False

Examples:

>>> df = pl.DataFrame({
>>>     "category": ["a", "a", "a", "b", "b"],
>>>     "values": [1, 2, 3, 4, 5]
>>> })
>>> df.select(
>>>     pds.combinations("values", 3)
>>> )
shape: (10, 1)
┌───────────┐
│ values    │
│ ---       │
│ list[i64] │
╞═══════════╡
│ [1, 2, 3] │
│ [1, 2, 4] │
│ [1, 2, 5] │
│ [1, 3, 4] │
│ [1, 3, 5] │
│ [1, 4, 5] │
│ [2, 3, 4] │
│ [2, 3, 5] │
│ [2, 4, 5] │
│ [3, 4, 5] │
└───────────┘
>>> df.group_by("category").agg(
>>>     pds.combinations("values", 2)
>>> )
shape: (2, 2)
┌──────────┬──────────────────────────┐
│ category ┆ values                   │
│ ---      ┆ ---                      │
│ str      ┆ list[list[i64]]          │
╞══════════╪══════════════════════════╡
│ a        ┆ [[1, 2], [1, 3], [2, 3]] │
│ b        ┆ [[4, 5]]                 │
└──────────┴──────────────────────────┘
Source code in python/polars_ds/exprs/expr_iter.py
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
def combinations(source: str | pl.Expr, k: int, unique: bool = False) -> pl.Expr:
    """
    Get all k-combinations of non-null values in source. This is an expensive operation, as
    n choose k can grow very fast.

    Parameters
    ----------
    source
        Input source column, must have numeric or string type
    k
        The k in N choose k
    unique
        Whether to run .unique() on the source column

    Examples
    --------
    >>> df = pl.DataFrame({
    >>>     "category": ["a", "a", "a", "b", "b"],
    >>>     "values": [1, 2, 3, 4, 5]
    >>> })
    >>> df.select(
    >>>     pds.combinations("values", 3)
    >>> )
    shape: (10, 1)
    ┌───────────┐
    │ values    │
    │ ---       │
    │ list[i64] │
    ╞═══════════╡
    │ [1, 2, 3] │
    │ [1, 2, 4] │
    │ [1, 2, 5] │
    │ [1, 3, 4] │
    │ [1, 3, 5] │
    │ [1, 4, 5] │
    │ [2, 3, 4] │
    │ [2, 3, 5] │
    │ [2, 4, 5] │
    │ [3, 4, 5] │
    └───────────┘
    >>> df.group_by("category").agg(
    >>>     pds.combinations("values", 2)
    >>> )
    shape: (2, 2)
    ┌──────────┬──────────────────────────┐
    │ category ┆ values                   │
    │ ---      ┆ ---                      │
    │ str      ┆ list[list[i64]]          │
    ╞══════════╪══════════════════════════╡
    │ a        ┆ [[1, 2], [1, 3], [2, 3]] │
    │ b        ┆ [[4, 5]]                 │
    └──────────┴──────────────────────────┘
    """
    s = to_expr(source).unique().drop_nulls().sort() if unique else to_expr(source).drop_nulls()
    return pl_plugin(
        symbol="pl_combinations",
        args=[s],
        changes_length=True,
        kwargs={
            "k": k,
        },
    )

product(s1, s2)

Get the cartesian product of two series. Only non-nulls values will be used.

Parameters:

Name Type Description Default
s1 str | Expr

The first column / series

required
s2 str | Expr

The second column / series

required

Examples:

>>> df = pl.DataFrame({
>>> "a": [1, 2]
>>> , "b": [4, 5]
>>> })
>>> df.select(
>>>     pds.product("a", "b")
>>> )
shape: (4, 1)
┌───────────┐
│ a         │
│ ---       │
│ list[i64] │
╞═══════════╡
│ [1, 4]    │
│ [1, 5]    │
│ [2, 4]    │
│ [2, 5]    │
└───────────┘
>>> df = pl.DataFrame({
>>>     "a": [[1,2], [3,4]]
>>>     , "b": [[3], [1, 2]]
>>> }).with_row_index()
>>> df
shape: (2, 3)
┌───────┬───────────┬───────────┐
│ index ┆ a         ┆ b         │
│ ---   ┆ ---       ┆ ---       │
│ u32   ┆ list[i64] ┆ list[i64] │
╞═══════╪═══════════╪═══════════╡
│ 0     ┆ [1, 2]    ┆ [3]       │
│ 1     ┆ [3, 4]    ┆ [1, 2]    │
└───────┴───────────┴───────────┘
>>> df.group_by(
>>>     "index"
>>> ).agg(
>>>     pds.product(
>>>         pl.col("a").list.explode()
>>>         , pl.col("b").list.explode()
>>>     ).alias("product")
>>> )
shape: (2, 2)
┌───────┬────────────────────────────┐
│ index ┆ product                    │
│ ---   ┆ ---                        │
│ u32   ┆ list[list[i64]]            │
╞═══════╪════════════════════════════╡
│ 0     ┆ [[1, 3], [2, 3]]           │
│ 1     ┆ [[3, 1], [3, 2], … [4, 2]] │
└───────┴────────────────────────────┘
Source code in python/polars_ds/exprs/expr_iter.py
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
def product(s1: str | pl.Expr, s2: str | pl.Expr) -> pl.Expr:
    """
    Get the cartesian product of two series. Only non-nulls values will be used.

    Parameters
    ----------
    s1
        The first column / series
    s2
        The second column / series

    Examples
    --------
    >>> df = pl.DataFrame({
    >>> "a": [1, 2]
    >>> , "b": [4, 5]
    >>> })
    >>> df.select(
    >>>     pds.product("a", "b")
    >>> )
    shape: (4, 1)
    ┌───────────┐
    │ a         │
    │ ---       │
    │ list[i64] │
    ╞═══════════╡
    │ [1, 4]    │
    │ [1, 5]    │
    │ [2, 4]    │
    │ [2, 5]    │
    └───────────┘

    >>> df = pl.DataFrame({
    >>>     "a": [[1,2], [3,4]]
    >>>     , "b": [[3], [1, 2]]
    >>> }).with_row_index()
    >>> df
    shape: (2, 3)
    ┌───────┬───────────┬───────────┐
    │ index ┆ a         ┆ b         │
    │ ---   ┆ ---       ┆ ---       │
    │ u32   ┆ list[i64] ┆ list[i64] │
    ╞═══════╪═══════════╪═══════════╡
    │ 0     ┆ [1, 2]    ┆ [3]       │
    │ 1     ┆ [3, 4]    ┆ [1, 2]    │
    └───────┴───────────┴───────────┘

    >>> df.group_by(
    >>>     "index"
    >>> ).agg(
    >>>     pds.product(
    >>>         pl.col("a").list.explode()
    >>>         , pl.col("b").list.explode()
    >>>     ).alias("product")
    >>> )
    shape: (2, 2)
    ┌───────┬────────────────────────────┐
    │ index ┆ product                    │
    │ ---   ┆ ---                        │
    │ u32   ┆ list[list[i64]]            │
    ╞═══════╪════════════════════════════╡
    │ 0     ┆ [[1, 3], [2, 3]]           │
    │ 1     ┆ [[3, 1], [3, 2], … [4, 2]] │
    └───────┴────────────────────────────┘
    """
    return pl_plugin(
        symbol="pl_product",
        args=[to_expr(s1).drop_nulls(), to_expr(s2).drop_nulls()],
        changes_length=True,
    )