Pandas 根据 GroupBy 值在另一列中填充值

Question

How can we create a new column, ie error_flag_type , whose value depends upon the groupby at student_id我们如何创建一个新列，即error_flag_type ，其值取决于student_id处的 groupby

The data looks like this:数据如下所示：

student_id学生卡	subject_id主题ID	error错误	team团队
1 1	1 1	yes是的	A一个
1 1	2 2		A一个
1 1	3 3		A一个
1 1		yes是的	A一个
2 2	4 4		B乙
2 2	5 5		B乙
2 2		yes是的	B乙
3 3	6 6		B乙
3 3	7 7		B乙
3 3	8 8		B乙
3 3	9 9	yes是的	B乙
4 4	10 10		A一个
4 4	11 11		A一个
4 4	12 12		A一个

And step by step operation looks like this:一步一步的操作是这样的：

1/4) If both student_id and subject_id has value yes in the error column 1/4) 如果student_id和subject_id在error列中的值为yes

student_id学生卡	subject_id主题ID	error错误	team团队
1 1	1 1	yes是的	A一个
1 1	2 2		A一个
1 1	3 3		A一个
1 1		yes是的	A一个

Thus, outcome:因此，结果：

student_id学生卡	error错误	team团队	error_flag_type error_flag_type
1 1	yes是的	A一个	both两个都

2/4) If only subject_id has value yes _ in the error column 2/4) 如果只有subject_id在error列中具有值yes _

student_id学生卡	subject_id主题ID	error错误	team团队
2 2	4 4		B乙
2 2	5 5		B乙
2 2		yes是的	B乙

Becomes:变成：

student_id学生卡	error错误	team团队	error_flag_type error_flag_type
2 2	yes是的	B乙	student_id_level student_id_level

3/4) 3/4)

student_id学生卡	subject_id主题ID	error错误	team团队
3 3	6 6		B乙
3 3	7 7		B乙
3 3	8 8		B乙
3 3	9 9	yes是的	B乙

Becomes变成

student_id学生卡	error错误	team团队	error_flag_type error_flag_type
3 3	yes是的	B乙	subject_id_level subject_id_level

4/4) 4/4)

student_id学生卡	subject_id主题ID	team团队
4 4	10 10	A一个
4 4	11 11	A一个
4 4	12 12	A一个

Becomes变成

student_id学生卡	error错误	team团队	error_flag_type error_flag_type
4 4	no_error no_error	A一个	no_error no_error

Looking at all the individual steps together as:将所有单独的步骤放在一起：

student_id学生卡	error错误	team团队	error_flag_type error_flag_type
1 1	yes是的	A一个	both两个都
2 2	yes是的	B乙	student_id_level student_id_level
3 3	yes是的	B乙	subject_id_level subject_id_level
4 4	no_error no_error	A一个	no_error no_error

Answer 1

I would approach this using ordered Categorical , this will give you flexibility to chose the order of the errors/flags.我会使用 ordered Categorical来解决这个问题，这将使您可以灵活地选择错误/标志的顺序。 Then a simple groupby.agg with max gives you the highest warning:然后一个简单的带有 max 的groupby.agg给你最高警告：

m1 = df['student_id'].eq(df['subject_id'])
m2 = df['error'].eq('yes')

(df.assign(error=pd.Categorical(df['error'].fillna('no_error'),
                                categories=['no_error', 'yes'],
                                ordered=True),
           error_flag_type=pd.Categorical(np.select([m1&m2, m2],
                                                    ['both', 'student_id_level'],
                                                    'no_error'
                                                   ),
                                          categories=['no_error',
                                                      'student_id_level',
                                                      'both'],
                                          ordered=True
                                         )
          )
    .groupby('student_id', as_index=False)
    .agg({'error': 'max', 'team': 'first', 'error_flag_type': 'max'})
)

output: output：

   student_id     error team   error_flag_type
0           1       yes    A              both
1           2       yes    B  student_id_level
2           3       yes    B  student_id_level
3           4  no_error    A          no_error

Pandas 根据 GroupBy 值在另一列中填充值

问题描述

1 个解决方案

解决方案1
0 2022-08-24 07:41:34

Pandas 根据 GroupBy 值在另一列中填充值

问题描述

1 个解决方案

解决方案1 0 2022-08-24 07:41:34

解决方案1
0 2022-08-24 07:41:34