[英]Pandas fill value in another column depending upon GroupBy values
How can we create a new column, ie error_flag_type
, whose value depends upon the groupby at student_id
我们如何创建一个新列,即
error_flag_type
,其值取决于student_id
处的 groupby
The data looks like this:数据如下所示:
student_id![]() |
subject_id![]() |
error![]() |
team![]() |
---|---|---|---|
1 ![]() |
1 ![]() |
yes![]() |
A![]() |
1 ![]() |
2 ![]() |
A![]() |
|
1 ![]() |
3 ![]() |
A![]() |
|
1 ![]() |
yes![]() |
A![]() |
|
2 ![]() |
4 ![]() |
B![]() |
|
2 ![]() |
5 ![]() |
B![]() |
|
2 ![]() |
yes![]() |
B![]() |
|
3 ![]() |
6 ![]() |
B![]() |
|
3 ![]() |
7 ![]() |
B![]() |
|
3 ![]() |
8 ![]() |
B![]() |
|
3 ![]() |
9 ![]() |
yes![]() |
B![]() |
4 ![]() |
10 ![]() |
A![]() |
|
4 ![]() |
11 ![]() |
A![]() |
|
4 ![]() |
12 ![]() |
A![]() |
And step by step operation looks like this:一步一步的操作是这样的:
1/4) If both student_id
and subject_id
has value yes in the error
column 1/4) 如果
student_id
和subject_id
在error
列中的值为yes
student_id![]() |
subject_id![]() |
error![]() |
team![]() |
---|---|---|---|
1 ![]() |
1 ![]() |
yes![]() |
A![]() |
1 ![]() |
2 ![]() |
A![]() |
|
1 ![]() |
3 ![]() |
A![]() |
|
1 ![]() |
yes![]() |
A![]() |
Thus, outcome:因此,结果:
student_id![]() |
error![]() |
team![]() |
error_flag_type ![]() |
---|---|---|---|
1 ![]() |
yes![]() |
A![]() |
both![]() |
2/4) If only subject_id
has value yes _ in the error
column 2/4) 如果只有
subject_id
在error
列中具有值yes _
student_id![]() |
subject_id![]() |
error![]() |
team![]() |
---|---|---|---|
2 ![]() |
4 ![]() |
B![]() |
|
2 ![]() |
5 ![]() |
B![]() |
|
2 ![]() |
yes![]() |
B![]() |
Becomes:变成:
student_id![]() |
error![]() |
team![]() |
error_flag_type ![]() |
---|---|---|---|
2 ![]() |
yes![]() |
B![]() |
student_id_level ![]() |
3/4) 3/4)
student_id![]() |
subject_id![]() |
error![]() |
team![]() |
---|---|---|---|
3 ![]() |
6 ![]() |
B![]() |
|
3 ![]() |
7 ![]() |
B![]() |
|
3 ![]() |
8 ![]() |
B![]() |
|
3 ![]() |
9 ![]() |
yes![]() |
B![]() |
Becomes变成
student_id![]() |
error![]() |
team![]() |
error_flag_type ![]() |
---|---|---|---|
3 ![]() |
yes![]() |
B![]() |
subject_id_level ![]() |
4/4) 4/4)
student_id![]() |
subject_id![]() |
error![]() |
team![]() |
---|---|---|---|
4 ![]() |
10 ![]() |
A![]() |
|
4 ![]() |
11 ![]() |
A![]() |
|
4 ![]() |
12 ![]() |
A![]() |
Becomes变成
student_id![]() |
error![]() |
team![]() |
error_flag_type ![]() |
---|---|---|---|
4 ![]() |
no_error ![]() |
A![]() |
no_error ![]() |
Looking at all the individual steps together as:将所有单独的步骤放在一起:
student_id![]() |
error![]() |
team![]() |
error_flag_type ![]() |
---|---|---|---|
1 ![]() |
yes![]() |
A![]() |
both![]() |
2 ![]() |
yes![]() |
B![]() |
student_id_level ![]() |
3 ![]() |
yes![]() |
B![]() |
subject_id_level ![]() |
4 ![]() |
no_error ![]() |
A![]() |
no_error ![]() |
I would approach this using ordered Categorical
, this will give you flexibility to chose the order of the errors/flags.我会使用 ordered
Categorical
来解决这个问题,这将使您可以灵活地选择错误/标志的顺序。 Then a simple groupby.agg
with max gives you the highest warning:然后一个简单的带有 max 的
groupby.agg
给你最高警告:
m1 = df['student_id'].eq(df['subject_id'])
m2 = df['error'].eq('yes')
(df.assign(error=pd.Categorical(df['error'].fillna('no_error'),
categories=['no_error', 'yes'],
ordered=True),
error_flag_type=pd.Categorical(np.select([m1&m2, m2],
['both', 'student_id_level'],
'no_error'
),
categories=['no_error',
'student_id_level',
'both'],
ordered=True
)
)
.groupby('student_id', as_index=False)
.agg({'error': 'max', 'team': 'first', 'error_flag_type': 'max'})
)
output: output:
student_id error team error_flag_type
0 1 yes A both
1 2 yes B student_id_level
2 3 yes B student_id_level
3 4 no_error A no_error
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.