简体   繁体   English

如果其他列中的值相同,则向前或向后填充 NA

[英]Fill NAs forwards or backwards if values in other columns are the same

Given this example:鉴于这个例子:

import pandas as pd
df = pd.DataFrame({
    "date": ["20180724", "20180725", "20180731", "20180723", "20180731"],
    "identity": [None, "A123456789", None, None, None],
    "hid": [12345, 12345, 12345, 54321, 54321],
    "hospital": ["A", "A", "A", "B", "B"],
    "result": [70, None, 100, 90, 78]
})

Because the first three rows have the same hid and hospital , the values in identity should also be identical.因为前三行的hidhospital相同,所以identity的值也应该相同。 As for the other two rows, they have the same hid and hospital as well, but no known identity was provided, so the values in identity should remain missing.至于其他两行,它们具有相同的hidhospital为好,但没有已知的identity被提供,所以在价值identity应该仍然下落不明。 In other words, the desired output is:换句话说,所需的输出是:

       date    identity    hid hospital  result
0  20180724  A123456789  12345        A    70.0
1  20180725  A123456789  12345        A     NaN
2  20180731  A123456789  12345        A   100.0
3  20180723        None  54321        B    90.0
4  20180731        None  54321        B    78.0

I can loop through all combinations of hid s and hospital s like for hid, hospital in df[["hid", "hospital"]].drop_duplicates().itertuples(index=False) , but I don't know how to do next.我可以循环遍历hidhospital的所有组合,如for hid, hospital in df[["hid", "hospital"]].drop_duplicates().itertuples(index=False) ,但我不知道如何接下来做。

Use groupby and apply in combination with ffill and bfill :使用groupbyapply与组合ffillbfill

df['identity'] = df.groupby(['hid', 'hospital'])['identity'].apply(lambda x: x.ffill().bfill())

This will fill NaNs forward and backwards while separating the values for the specified groups.这将在分隔指定组的值的同时向前向后填充 NaN。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM