根據列值重復 Polars DataFrame 中的行

Question

我想通過根據數量列中的值重復行來擴展以下Polars dataframe。

原裝DataFrame：

水果	數量
蘋果	2
香蕉	3

預期 Output：

水果	數量
蘋果	1
蘋果	1
香蕉	1
香蕉	1
香蕉	1

這是一個非常相似的問題，但使用 Pandas 而不是 Polars: Repeat rows in a pandas DataFrame based on column value

The polars repeat function does not seem to offer the same functionality as its Pandas counterpart: https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.repeat.html

Answer 1

你很親密。 您正在尋找的是repeat_by表達式。

首先是一些數據。 我將添加一個ID列，只是為了展示如何將repeat_by表達式應用於多個列（但不包括Quantity ）。

import polars as pl

df = (
    pl.DataFrame({
        'ID' : [100, 200],
        'Fruit': ["Apple", "Banana"],
        'Quantity': [2, 3],
    })
)
df

shape: (2, 3)
┌─────┬────────┬──────────┐
│ ID  ┆ Fruit  ┆ Quantity │
│ --- ┆ ---    ┆ ---      │
│ i64 ┆ str    ┆ i64      │
╞═════╪════════╪══════════╡
│ 100 ┆ Apple  ┆ 2        │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana ┆ 3        │
└─────┴────────┴──────────┘

算法

(
    df
    .select(
        pl.exclude('Quantity').repeat_by('Quantity').explode()
    )
    .with_column(
        pl.lit(1).alias('Quantity')
    )
)

shape: (5, 3)
┌─────┬────────┬──────────┐
│ ID  ┆ Fruit  ┆ Quantity │
│ --- ┆ ---    ┆ ---      │
│ i64 ┆ str    ┆ i32      │
╞═════╪════════╪══════════╡
│ 100 ┆ Apple  ┆ 1        │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 100 ┆ Apple  ┆ 1        │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana ┆ 1        │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana ┆ 1        │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana ┆ 1        │
└─────┴────────┴──────────┘

這個怎么運作

repeat_by表達式將通過另一列/表達式中的值重復系列中的值。 在這種情況下，我們希望重復Quantity中的值。

我們還將使用exclude表達式將repeat_by應用於除Quantity之外的所有列（稍后我們將替換它）。

請注意， repeat_by的結果是一個列表。

(
    df
    .select(
        pl.exclude('Quantity').repeat_by('Quantity')
    )
)

shape: (2, 2)
┌─────────────────┬────────────────────────────────┐
│ ID              ┆ Fruit                          │
│ ---             ┆ ---                            │
│ list[i64]       ┆ list[str]                      │
╞═════════════════╪════════════════════════════════╡
│ [100, 100]      ┆ ["Apple", "Apple"]             │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ [200, 200, 200] ┆ ["Banana", "Banana", "Banana"] │
└─────────────────┴────────────────────────────────┘

接下來，我們使用explode ，它將獲取每個列表的每個元素並將其放置在自己的行中。

(
    df
    .select(
        pl.exclude('Quantity').repeat_by('Quantity').explode()
    )
)

shape: (5, 2)
┌─────┬────────┐
│ ID  ┆ Fruit  │
│ --- ┆ ---    │
│ i64 ┆ str    │
╞═════╪════════╡
│ 100 ┆ Apple  │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 100 ┆ Apple  │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 200 ┆ Banana │
└─────┴────────┘

從那里，我們使用lit表達式將Quantity添加回 DataFrame。

根據列值重復 Polars DataFrame 中的行

問題描述

1 個解決方案

解決方案1
0 2022-08-29 02:18:04

算法

這個怎么運作

根據列值重復 Polars DataFrame 中的行

問題描述

1 個解決方案

解決方案1 0 2022-08-29 02:18:04

算法

這個怎么運作

解決方案1
0 2022-08-29 02:18:04