根据许多列的极坐标中的另一个列条件（最大值）查找列的值

Question

如果我有这个 dataframe：

pl.DataFrame(dict(x=[0, 1, 2, 3], y=[5, 2, 3, 3],z=[4,7,8,2]))
shape: (4, 3)
┌─────┬─────┬─────┐
│ x   ┆ y   ┆ z   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 0   ┆ 5   ┆ 4   │
│ 1   ┆ 2   ┆ 7   │
│ 2   ┆ 3   ┆ 8   │
│ 3   ┆ 3   ┆ 2   │
└─────┴─────┴─────┘

我想在 x 中找到 y 为最大值的值，然后再次在 x 中找到 z 为最大值的值，并重复数百列，这样我最终得到如下内容：

shape: (2, 2)
┌────────┬─────────┐
│ column ┆ x_value │
│ ---    ┆ ---     │
│ str    ┆ i64     │
╞════════╪═════════╡
│ y      ┆ 0       │
│ z      ┆ 2       │
└────────┴─────────┘

或者

shape: (1, 2)
┌─────┬─────┐
│ y   ┆ z   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 0   ┆ 2   │
└─────┴─────┘

最好的极地方式是什么？

Answer 1

你可以：

pl.exclude("x")到 select 所有列减去x
.arg_max()获取每个选定列的最大索引。
将索引传递给pl.col("x").take()以获取每个索引处的x值。
pl.concat_list()创建所有值的列表。

>>> df.select(pl.concat_list(pl.col("x").take(pl.exclude("x").arg_max())))
shape: (1, 1)
┌───────────┐
│ x         │
│ ---       │
│ list[i64] │
╞═══════════╡
│ [0, 2]    │
└───────────┘

要添加列名，您可以：

# Feels like this could be simplified?
columns = df.columns
columns.remove("x")
columns = pl.Series(columns).alias("column")

df.select(
   pl.concat_list(
      pl.col("x").take(pl.exclude("x").arg_max())
   ).flatten()
).with_columns(columns)

shape: (2, 2)
┌─────┬────────┐
│ x   | column │
│ --- | ---    │
│ i64 | str    │
╞═════╪════════╡
│ 0   | y      │
├─────┼────────┤
│ 2   | z      │
└─────┴────────┘

其他结果的可能方法：

(df.with_column(pl.exclude("x").arg_max())
   .select([
      pl.col("x").take(col).first().alias(col) 
      for col in df.columns 
      if col != "x"
   ])
)

shape: (1, 2)
┌─────┬─────┐
│ y   | z   │
│ --- | --- │
│ i64 | i64 │
╞═════╪═════╡
│  0  |  2  │
└─────┴─────┘

根据许多列的极坐标中的另一个列条件（最大值）查找列的值

问题描述

1 个解决方案

解决方案1
1 已采纳 2023-01-12 02:56:28

根据许多列的极坐标中的另一个列条件（最大值）查找列的值

问题描述

1 个解决方案

解决方案1 1 已采纳 2023-01-12 02:56:28

解决方案1
1 已采纳 2023-01-12 02:56:28