使用 pandas dataframe 中的条件进行分组操作

Question

我想在 pandas 中执行 groupby 操作。 例如，我想对patient列进行分组，如果treatment列== X将相应的doctor值转移到名为nurse的新列。

例如：df

import pandas as pd
import numpy as np

df = pd.DataFrame({'patient': ['a','a','a','b','b','b'],
   ....:           'treatment': ['X','Y','Y','X','Z','Z'],
                   'doctor': ['1','2','2','2','3','3']})

  patient treatment doctor
0       a         X      1
1       a         Y      2
2       a         Y      2
3       b         X      2
4       b         Z      3
5       b         Z      3

我试过了

df=df.assign(nurse=np.where(df.['treatment'].str.contains('X'),df.groupby('patient')['doctor'], np.nan))

但出现错误

SyntaxError：无效的语法

预期的 output

    patient treatment doctor  nurse
0       a         X      1      1
1       a         Y      2      1
2       a         Y      2      1
3       b         X      2      2
4       b         Z      3      2
5       b         Z      3      2

我怎样才能实现这个 output？

谢谢

Answer 1

使用DataFrame.apply + Series.where 。 然后塞满ffill ：

df['nurse']=df.groupby('patient',sort=False).apply(lambda x: x['doctor'].where(x['treatment'].eq('X')).ffill()).reset_index(drop=True)
print(df)

     patient treatment doctor nurse
0       a         X      1     1
1       a         Y      2     1
2       a         Y      2     1
3       b         X      2     2
4       b         Z      3     2
5       b         Z      3     2

使用 pandas dataframe 中的条件进行分组操作

问题描述

1 个解决方案

解决方案1
3 已采纳 2019-10-10 23:05:38

使用 pandas dataframe 中的条件进行分组操作

问题描述

1 个解决方案

解决方案1 3 已采纳 2019-10-10 23:05:38

解决方案1
3 已采纳 2019-10-10 23:05:38