简体   繁体   English

根据许多列的极坐标中的另一个列条件(最大值)查找列的值

[英]Find value of column based on another column condition (max) in polars for many columns

If I have this dataframe:如果我有这个 dataframe:

pl.DataFrame(dict(x=[0, 1, 2, 3], y=[5, 2, 3, 3],z=[4,7,8,2]))
shape: (4, 3)
┌─────┬─────┬─────┐
│ x   ┆ y   ┆ z   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 0   ┆ 5   ┆ 4   │
│ 1   ┆ 2   ┆ 7   │
│ 2   ┆ 3   ┆ 8   │
│ 3   ┆ 3   ┆ 2   │
└─────┴─────┴─────┘

and I want to find the value in x where y is max, then again find the value in x where z is max, and repeat for hundreds more columns so that I end up with something like:我想在 x 中找到 y 为最大值的值,然后再次在 x 中找到 z 为最大值的值,并重复数百列,这样我最终得到如下内容:

shape: (2, 2)
┌────────┬─────────┐
│ column ┆ x_value │
│ ---    ┆ ---     │
│ str    ┆ i64     │
╞════════╪═════════╡
│ y      ┆ 0       │
│ z      ┆ 2       │
└────────┴─────────┘

or或者

shape: (1, 2)
┌─────┬─────┐
│ y   ┆ z   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 0   ┆ 2   │
└─────┴─────┘

What is the best polars way to do that?最好的极地方式是什么?

You could:你可以:

  • pl.exclude("x") to select all columns minus x pl.exclude("x")到 select 所有列减去x

  • .arg_max() to get the max index of each selected column. .arg_max()获取每个选定列的最大索引。

  • pass the indexes to pl.col("x").take() to get the x value at each index.将索引传递给pl.col("x").take()以获取每个索引处的x值。

  • pl.concat_list() to create a list of all the values. pl.concat_list()创建所有值的列表。

>>> df.select(pl.concat_list(pl.col("x").take(pl.exclude("x").arg_max())))
shape: (1, 1)
┌───────────┐
│ x         │
│ ---       │
│ list[i64] │
╞═══════════╡
│ [0, 2]    │
└───────────┘

To add in the column names you could:要添加列名,您可以:

# Feels like this could be simplified?
columns = df.columns
columns.remove("x")
columns = pl.Series(columns).alias("column")

df.select(
   pl.concat_list(
      pl.col("x").take(pl.exclude("x").arg_max())
   ).flatten()
).with_columns(columns)
shape: (2, 2)
┌─────┬────────┐
│ x   | column │
│ --- | ---    │
│ i64 | str    │
╞═════╪════════╡
│ 0   | y      │
├─────┼────────┤
│ 2   | z      │
└─────┴────────┘

Possible approach for the other result:其他结果的可能方法:

(df.with_column(pl.exclude("x").arg_max())
   .select([
      pl.col("x").take(col).first().alias(col) 
      for col in df.columns 
      if col != "x"
   ])
)
shape: (1, 2)
┌─────┬─────┐
│ y   | z   │
│ --- | --- │
│ i64 | i64 │
╞═════╪═════╡
│  0  |  2  │
└─────┴─────┘

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 查找最多两列并根据条件填充第三列中的值 - Find max of two columns and populate with value in third column based on a condition 如何根据 python 中另一列的条件查找两个日期之间特定列的最大值 - How do I Find max value of a particular column between 2 dates based on a condition from another column in python 根据另一列中的变化条件返回一列中的最大值 - Returning max value in one column based on changing condition in another column 如何根据数据框的另一列中的条件查找列中的最小值? - How to find minimum value in a column based on condition in an another column of a dataframe? 根据条件查找列值 - Find the column value based on condition 根据条件用另一列值替换大量列中的值 - Replacing values in large number of columns with another column value based on a condition 根据条件将一个 dataframe 中的列值设置为另一个 dataframe 列 - Setting value of columns in one dataframe to another dataframe column based on condition 如何根据另一列的条件在3D数组中查找列的最大值? - How to find max of column in 3D array based on condition in another column? 根据列值重复 Polars DataFrame 中的行 - Repeat rows in a Polars DataFrame based on column value 如何在数据框中查找与另一列值的条件有关的列的最大总和? - How to find the max sum of a column in a dataframe regarding a condition of another column's value?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM