简体   繁体   English

我如何根据多个条件从 DataFrame 中获取 select 行

[英]How do I select rows from a DataFrame based on multi conditions

I have a pandas DataFrame that looks:我有一个 pandas DataFrame 看起来:

df=pd.DataFrame({'user': ['user 1', 'user 4', 'user 1', 'user 4', 'user 1', 'user 4'],
                 'group': [0, 0, 1, 1, 2, 2],
                'x1': [0.9, 0.9, 0.7, 0.7, 0.4, 0.4],
                'x2': [0.759740, 1.106061, 0.619357, 1.260234, 0.540633, 1.437956]})

output: output:

    user  group  x1    x2
0   user 1  0   0.9 0.759740
1   user 4  0   0.9 1.106061
2   user 1  1   0.7 0.619357
3   user 4  1   0.7 1.260234
4   user 1  2   0.4 0.540633
5   user 4  2   0.4 1.437956

I want to return each user with a condition if x2 is below x1 then return this row and if there is no row that meets this condition when x2 is below x1 then return this user with a change group number to 10.如果 x2 低于 x1,我想返回每个用户的条件,然后返回此行,如果当 x2 低于 x1 时没有满足此条件的行,则返回此用户,并将组号更改为 10。

For example: for the user1, row number 2 should be selected since it returns a min value of x2 below x1 1 and even row 4 has a min value of x2 but x2 is higher than x1.例如:对于 user1,应选择行号 2,因为它返回的 x2 的最小值低于 x1 1,甚至第 4 行的最小值也是 x2,但 x2 高于 x1。 for user 4, all x2 higher than x1 for all rows, so we change group number for min value of x2 to number 10.对于用户 4,所有行的所有 x2 都高于 x1,因此我们将 x2 的最小值的组编号更改为编号 10。

在此处输入图像描述

The expected output:预期的 output:

在此处输入图像描述

Use:利用:

df2 = (df[df['x2'].lt(df['x1'])]
           .set_index('group')
           .groupby('user')['x2']
           .idxmin()
           .reindex(df['user'].unique(), fill_value=10)
           .reset_index(name='group'))
print (df2)

     user  group
0  user 1      1
1  user 4     10

How it working:它是如何工作的:

First filter rows by condition in boolean indexing :首先在boolean indexing中按条件过滤行:

print (df[df['x2'].lt(df['x1'])])
     user  group   x1        x2
0  user 1      0  0.9  0.759740
2  user 1      1  0.7  0.619357

Then get group names by minimal x2 per groups by DataFrameGroupBy.idxmin , so used DataFrame.set_index :然后通过DataFrameGroupBy.idxmin每组最小x2获取组名,因此使用DataFrame.set_index

print (df[df['x2'].lt(df['x1'])].set_index('group'))
         user   x1        x2
group                       
0      user 1  0.9  0.759740
1      user 1  0.7  0.619357

And then add missing users by unique values in Series.reindex :然后通过Series.reindex中的唯一值添加缺失的用户:

print (df[df['x2'].lt(df['x1'])].set_index('group').groupby('user')['x2'].idxmin())
user
user 1     1
Name: x2, dtype: int64

print (df[df['x2'].lt(df['x1'])].set_index('group')
        .groupby('user')['x2'].idxmin()
        .reindex(df['user'].unique(), fill_value=10))
user
user 1     1
user 4    10
Name: x2, dtype: int64

And last create 2 columns DataFrame by Series.reset_index .最后通过 Series.reset_index 创建 2 列Series.reset_index

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据给定条件的列值从 DataFrame 中提取 select 行? - How do I select rows from a DataFrame based on column values with given conditions? 如何根据条件从一个 dataframe 到另一个 dataframe 中的 select 行 - How to select rows from a dataframe based on conditions with another dataframe 如何根据列值从 DataFrame 中 select 行? - How do I select rows from a DataFrame based on column values? 如何根据另一个数据框的条件从多索引数据框中选择一个子集 - How to select a subset from a Multi-Index Dataframe based on conditions from another DataFrame 如何根据pandas数据框中的多列值条件排除行? - How to exclude rows based on multi column value conditions in pandas dataframe? 有没有更好的方法来基于多个条件从 pandas DataFrame 行 select 行? - Is there a better way to select rows from a pandas DataFrame based on multiple conditions? 如何根据列值的某些首选项从 DataFrame 中选择行? - How do I select rows from DataFrame based on some preferences on column values? 如何根据 Python/Pandas 数据框中的多个条件删除行? - How do I remove rows based on multiple conditions in Python / Pandas dataframe? 如何根据这些条件“合并” Pandas DataFrame 中的行 - How can I “merge” rows in a Pandas DataFrame based on these conditions 如何在一定条件下从熊猫数据框中选择行 - How to select rows from the pandas dataframe with certain conditions
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM