如何在 Python 極坐標 dataframe 中正確設置二進制標志

Question

在 Python polars (v0.15.15) 中實現二進制標志列時，我遇到了一些看似奇怪的行為。 給定一個 df

import polars as pl

df = pl.DataFrame({
        "col1": [0,1,2,3],
        "flag": [0,0,0,0]
    })

我通過or -ing 當前標志值設置標志，例如 2

df = df.with_column(
        pl.when((pl.col("col1") < 1) | (pl.col("col1") >= 3))
        .then(pl.col("flag") | 2) # set flag b0010
        .otherwise(pl.col("flag"))
    )
print(df)
shape: (4, 2)
┌──────┬──────┐
│ col1 ┆ flag │
│ ---  ┆ ---  │
│ i64  ┆ i64  │
╞══════╪══════╡
│ 0    ┆ 2    │
│ 1    ┆ 0    │
│ 2    ┆ 0    │
│ 3    ┆ 2    │
└──────┴──────┘

到目前為止一切順利，但是當添加另一個標志時，我得到了一些意想不到的東西：

df = df.with_column(
        pl.when(pl.col("col1") > -1)  
        .then(pl.col("flag") | 4) # also set flag b0100
        .otherwise(pl.col("flag"))
    )
print(df)
shape: (4, 2)
┌──────┬──────┐
│ col1 ┆ flag │
│ ---  ┆ ---  │
│ i64  ┆ i64  │
╞══════╪══════╡
│ 0    ┆ 6    │
│ 1    ┆ 6    │ # <-- ?! 0 | 4 is 4, not 6
│ 2    ┆ 6    │ # <-- ?! 0 | 4 is 4, not 6
│ 3    ┆ 6    │
└──────┴──────┘

為什么現在所有標志都是 6？ 我希望[6, 4, 4, 6]

反過來做（設置標志 4，然后設置標志 2），結果如預期的那樣：

df = pl.DataFrame({"col1": [0,1,2,3], "flag": [0,0,0,0]})
df = df.with_column(
        pl.when(pl.col("col1") > -1)  
        .then(pl.col("flag") | 4)
        .otherwise(pl.col("flag"))
    )
df = df.with_column(
        pl.when((pl.col("col1") < 1) | (pl.col("col1") >= 3))
        .then(pl.col("flag") | 2)
        .otherwise(pl.col("flag"))
    )
print(df)
shape: (4, 2)
┌──────┬──────┐
│ col1 ┆ flag │
│ ---  ┆ ---  │
│ i64  ┆ i64  │
╞══════╪══════╡
│ 0    ┆ 6    │
│ 1    ┆ 4    │
│ 2    ┆ 4    │
│ 3    ┆ 6    │
└──────┴──────┘

這是怎么回事，我錯過了什么？

Answer 1

系列范圍的按位運算，如 OR ( | ) 似乎還沒有實現； 問題提交於 github。

解決方法例如是apply （效率相當低）：

import polars as pl

df = pl.DataFrame({"col1": [0,1,2,3], "flag": [0,0,0,0]})

df = df.with_columns(
        pl.when((pl.col("col1") < 1) | (pl.col("col1") >= 3))
        .then(pl.col("flag").apply(lambda flag: flag | 2)) # set flag b0010
        .otherwise(pl.col("flag"))
    )
df = df.with_columns(
        pl.when(pl.col("col1") > -1)
        .then(pl.col("flag").apply(lambda flag: flag | 4)) # set/combine with flag b0100
        .otherwise(pl.col("flag"))
    )
print(df)
shape: (4, 2)
┌──────┬──────┐
│ col1 ┆ flag │
│ ---  ┆ ---  │
│ i64  ┆ i64  │
╞══════╪══════╡
│ 0    ┆ 6    │
│ 1    ┆ 4    │
│ 2    ┆ 4    │
│ 3    ┆ 6    │
└──────┴──────┘

或者類似地np.bitwise_or （感謝@jqurious）：

df.with_columns(
        pl.when(condition_for_flag)
        .then(np.bitwise_or(pl.col("flag"), flag_to_set))
        .otherwise(pl.col("flag"))
        )

或np.where而不是 polar 的 when-then-else，然后將結果轉換回系列：

df.with_columns(
        pl.Series(
            np.where(condition_for_flag,
                     df["flag"].to_numpy() | flag_to_set,
                     df["flag"]
            )
        ).alias("flag")
    )

np.bitwise_和np.where似乎都比apply更有效。 雖然apply很可能具有線性時間復雜度， np.bitwise_和np.where可能會根據輸入大小執行不同的操作。 如有疑問，請測試您的特定（典型）輸入大小。

如何在 Python 極坐標 dataframe 中正確設置二進制標志

問題描述

1 個解決方案

解決方案1
0 2023-01-19 07:59:25

如何在 Python 極坐標 dataframe 中正確設置二進制標志

問題描述

1 個解決方案

解決方案1 0 2023-01-19 07:59:25

解決方案1
0 2023-01-19 07:59:25