简体   繁体   English

如何根据列值迭代地选择 pandas 中的行?

[英]How do I iteratively select rows in pandas based on column values?

I'm a complete newbie at pandas so a simpler (though maybe not the most efficient or elegant) solution is appreciated.我是 pandas 的新手,所以更简单(尽管可能不是最有效或最优雅)的解决方案值得赞赏。 I don't mind a bit of brute force if I can understand the answer better.如果我能更好地理解答案,我不介意有点蛮力。

If I have the following Dataframe:如果我有以下数据框:

A    B    C 
0    0    1
0    1    1

I want to loop through columns "A", "B" and "C" in that order and during each iteration select all the rows for which the current column is "1" and none of the previous columns are and save the result and also use it in the next iteration.我想按顺序遍历列“A”、“B”和“C”,并在每次迭代期间选择当前列为“1”的所有行,并且之前的列都不是,并保存结果在下一次迭代中使用它。

So when looking at column A, I wouldn't select anything.因此,在查看 A 列时,我不会选择任何内容。 Then when looking at column BI would select the second row because B==1 and A==0.然后在查看 BI 列时会选择第二行,因为 B==1 和 A==0。 Then when looking at column CI would select the first row because A==0 and B==0.然后在查看 CI 列时会选择第一行,因为 A==0 和 B==0。

Create a boolean mask:创建一个布尔掩码:

m = (df == 1) & (df.cumsum(axis=1) == 1)
d = {col: df[m[col]].index.tolist() for col in df.columns if m[col].sum()}

Output:输出:

>>> m
       A      B      C
0  False  False   True
1  False   True  False
2  False  False   True

>>> d
{'B': [1], 'C': [0, 2]}

I slightly modified your dataframe:我稍微修改了您的数据框:

>>> df
   A  B  C
0  0  0  1
1  0  1  1
2  0  0  1

Update更新

For the expected output on my sample:对于我的样本的预期输出:

for rows, col in zip(m, df.columns):
    if m[col].sum():
        print(f"\n=== {col} ===")
        print(df[m[col]])

Output:输出:

=== B ===
   A  B  C
1  0  1  1

=== C ===
   A  B  C
0  0  0  1
2  0  0  1

Seems like you need a direct use of idxmax好像你需要直接使用idxmax

Return index of first occurrence of maximum over requested axis.返回请求轴上第一次出现最大值的索引。

NA/null values are excluded. NA/空值被排除在外。


>>> df.idxmax()
A    0
B    1
C    0
dtype: int64

The values above are the indexes for which your constraints are met.上面的值是满足您的约束的索引。 1 for B means that the second row was "selected". B1表示第二行被“选中”。 0 for C, same. 0代表 C,相同。 The only issue is that, if nothing is found, it'll also return 0 .唯一的问题是,如果什么也没找到,它也会返回0

To address that, you can use where为了解决这个问题,您可以使用where

>>> df.idxmax().where(~df.eq(0).all())

This will make sure that NaN s are returned for all-zero columns.这将确保为全零列返回NaN

A    NaN
B    1.0
C    0.0
dtype: float64

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM