简体   繁体   English

groupby中的条件前填充

[英]conditional forward fill within groupby

I have a data frame for patients and their visits to the clinic. 我有一个病人及其去诊所的数据框。 Patients may take a drug at some visits, and only the initial dose is recorded, or when the dose is changed. 患者可能会在某些就诊时服用药物,并且仅记录初始剂量或更改剂量时。 If the dose doesn't change at the next visit, what's recorded is "drug ongoing? Yes. Dose changed? No". 如果在下次访问时剂量没有变化,则记录为“正在服用药物?是。剂量已改变?否”。 What I need to get is the exact dose for each visit. 我需要得到的是每次访问的确切剂量。

I tried forward fill with groupby (groupby patient_id ), but I'm stuck at how to insert the condition that only fill missing when drug is ongoing and dose is not changed. 我尝试使用groupby(groupby patient_id )进行正向填充,但是我陷入了如何插入仅在进行药物且剂量patient_id的情况下仅填充缺失的情况。

df = pd.DataFrame({'patient_id': ['a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c'], \
              'visit_number':[1, 2, 3, 2, 3, 4, 10, 11, 12], \
             'drug_ongoing':[np.nan, 1, 1, np.nan, 0, 1, 1, 1, 0], \
             'drug_dose_changed':[0, 0, 0, 0, np.nan,0, 0, 1, np.nan], \
             'dose':[40, np.nan, np.nan, 60, np.nan, 70, 80, np.nan, np.nan]})

I tried: 我试过了:

df['dose_filled'] = df.groupby('patient_id')['dose'].ffill()

But in this way, all the missing is filled. 但是通过这种方式,所有的缺失都被填补了。

The desired new column 'dose_filled' is [40, 40, 40, 60, np.nan, 70, 80, np.nan, np.nan] 所需的新列'dose_filled'[40, 40, 40, 60, np.nan, 70, 80, np.nan, np.nan]

In your case , filter before ffill 在您的情况下,请先ffill

s=df.loc[(df['drug_ongoing'].eq(1)&df['drug_dose_changed'].eq(0))|df.visit_number.eq(df.groupby('patient_id').visit_number.transform('first'))].groupby('patient_id').dose.ffill()
df.dose.fillna(s,inplace=True)
df
Out[38]: 
  patient_id  visit_number  drug_ongoing  drug_dose_changed  dose
0          a             1           NaN                0.0  40.0
1          a             2           1.0                0.0  40.0
2          a             3           1.0                0.0  40.0
3          b             2           NaN                0.0  60.0
4          b             3           0.0                NaN   NaN
5          b             4           1.0                0.0  70.0
6          c            10           1.0                0.0  80.0
7          c            11           1.0                1.0   NaN
8          c            12           0.0                NaN   NaN

I think you need: 我认为您需要:

np.where(~df.drug_dose_changed.astype(bool),df.dose.ffill(),df.dose)

Output: 输出:

array([40., 40., 40., 60., nan, 70., 80., nan, nan])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM