[英]How do I select rows from a DataFrame based on multi conditions
I have a pandas DataFrame that looks:我有一个 pandas DataFrame 看起来:
df=pd.DataFrame({'user': ['user 1', 'user 4', 'user 1', 'user 4', 'user 1', 'user 4'],
'group': [0, 0, 1, 1, 2, 2],
'x1': [0.9, 0.9, 0.7, 0.7, 0.4, 0.4],
'x2': [0.759740, 1.106061, 0.619357, 1.260234, 0.540633, 1.437956]})
output: output:
user group x1 x2
0 user 1 0 0.9 0.759740
1 user 4 0 0.9 1.106061
2 user 1 1 0.7 0.619357
3 user 4 1 0.7 1.260234
4 user 1 2 0.4 0.540633
5 user 4 2 0.4 1.437956
I want to return each user with a condition if x2 is below x1 then return this row and if there is no row that meets this condition when x2 is below x1 then return this user with a change group number to 10.如果 x2 低于 x1,我想返回每个用户的条件,然后返回此行,如果当 x2 低于 x1 时没有满足此条件的行,则返回此用户,并将组号更改为 10。
For example: for the user1, row number 2 should be selected since it returns a min value of x2 below x1 1 and even row 4 has a min value of x2 but x2 is higher than x1.例如:对于 user1,应选择行号 2,因为它返回的 x2 的最小值低于 x1 1,甚至第 4 行的最小值也是 x2,但 x2 高于 x1。 for user 4, all x2 higher than x1 for all rows, so we change group number for min value of x2 to number 10.
对于用户 4,所有行的所有 x2 都高于 x1,因此我们将 x2 的最小值的组编号更改为编号 10。
The expected output:预期的 output:
Use:利用:
df2 = (df[df['x2'].lt(df['x1'])]
.set_index('group')
.groupby('user')['x2']
.idxmin()
.reindex(df['user'].unique(), fill_value=10)
.reset_index(name='group'))
print (df2)
user group
0 user 1 1
1 user 4 10
How it working:它是如何工作的:
First filter rows by condition in boolean indexing
:首先在
boolean indexing
中按条件过滤行:
print (df[df['x2'].lt(df['x1'])])
user group x1 x2
0 user 1 0 0.9 0.759740
2 user 1 1 0.7 0.619357
Then get group names by minimal x2
per groups by DataFrameGroupBy.idxmin
, so used DataFrame.set_index
:然后通过
DataFrameGroupBy.idxmin
每组最小x2
获取组名,因此使用DataFrame.set_index
:
print (df[df['x2'].lt(df['x1'])].set_index('group'))
user x1 x2
group
0 user 1 0.9 0.759740
1 user 1 0.7 0.619357
And then add missing users by unique values in Series.reindex
:然后通过
Series.reindex
中的唯一值添加缺失的用户:
print (df[df['x2'].lt(df['x1'])].set_index('group').groupby('user')['x2'].idxmin())
user
user 1 1
Name: x2, dtype: int64
print (df[df['x2'].lt(df['x1'])].set_index('group')
.groupby('user')['x2'].idxmin()
.reindex(df['user'].unique(), fill_value=10))
user
user 1 1
user 4 10
Name: x2, dtype: int64
And last create 2 columns DataFrame by Series.reset_index
.最后通过 Series.reset_index 创建 2 列
Series.reset_index
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.