我如何根据多个条件从 DataFrame 中获取 select 行

Question

I have a pandas DataFrame that looks:我有一个 pandas DataFrame 看起来：

df=pd.DataFrame({'user': ['user 1', 'user 4', 'user 1', 'user 4', 'user 1', 'user 4'],
                 'group': [0, 0, 1, 1, 2, 2],
                'x1': [0.9, 0.9, 0.7, 0.7, 0.4, 0.4],
                'x2': [0.759740, 1.106061, 0.619357, 1.260234, 0.540633, 1.437956]})

output: output：

    user  group  x1    x2
0   user 1  0   0.9 0.759740
1   user 4  0   0.9 1.106061
2   user 1  1   0.7 0.619357
3   user 4  1   0.7 1.260234
4   user 1  2   0.4 0.540633
5   user 4  2   0.4 1.437956

I want to return each user with a condition if x2 is below x1 then return this row and if there is no row that meets this condition when x2 is below x1 then return this user with a change group number to 10.如果 x2 低于 x1，我想返回每个用户的条件，然后返回此行，如果当 x2 低于 x1 时没有满足此条件的行，则返回此用户，并将组号更改为 10。

For example: for the user1, row number 2 should be selected since it returns a min value of x2 below x1 1 and even row 4 has a min value of x2 but x2 is higher than x1.例如：对于 user1，应选择行号 2，因为它返回的 x2 的最小值低于 x1 1，甚至第 4 行的最小值也是 x2，但 x2 高于 x1。 for user 4, all x2 higher than x1 for all rows, so we change group number for min value of x2 to number 10.对于用户 4，所有行的所有 x2 都高于 x1，因此我们将 x2 的最小值的组编号更改为编号 10。

The expected output:预期的 output：

Answer 1

Use:利用：

df2 = (df[df['x2'].lt(df['x1'])]
           .set_index('group')
           .groupby('user')['x2']
           .idxmin()
           .reindex(df['user'].unique(), fill_value=10)
           .reset_index(name='group'))
print (df2)

     user  group
0  user 1      1
1  user 4     10

How it working:它是如何工作的：

First filter rows by condition in boolean indexing :首先在boolean indexing中按条件过滤行：

print (df[df['x2'].lt(df['x1'])])
     user  group   x1        x2
0  user 1      0  0.9  0.759740
2  user 1      1  0.7  0.619357

Then get group names by minimal x2 per groups by DataFrameGroupBy.idxmin , so used DataFrame.set_index :然后通过DataFrameGroupBy.idxmin每组最小x2获取组名，因此使用DataFrame.set_index ：

print (df[df['x2'].lt(df['x1'])].set_index('group'))
         user   x1        x2
group                       
0      user 1  0.9  0.759740
1      user 1  0.7  0.619357

And then add missing users by unique values in Series.reindex :然后通过Series.reindex中的唯一值添加缺失的用户：

print (df[df['x2'].lt(df['x1'])].set_index('group').groupby('user')['x2'].idxmin())
user
user 1     1
Name: x2, dtype: int64

print (df[df['x2'].lt(df['x1'])].set_index('group')
        .groupby('user')['x2'].idxmin()
        .reindex(df['user'].unique(), fill_value=10))
user
user 1     1
user 4    10
Name: x2, dtype: int64

And last create 2 columns DataFrame by Series.reset_index .最后通过 Series.reset_index 创建 2 列Series.reset_index 。

我如何根据多个条件从 DataFrame 中获取 select 行

问题描述

1 个解决方案

解决方案1
1 已采纳 2023-01-05 11:06:33

我如何根据多个条件从 DataFrame 中获取 select 行

问题描述

1 个解决方案

解决方案1 1 已采纳 2023-01-05 11:06:33

解决方案1
1 已采纳 2023-01-05 11:06:33