[英]Fill NAs forwards or backwards if values in other columns are the same
Given this example:鉴于这个例子:
import pandas as pd
df = pd.DataFrame({
"date": ["20180724", "20180725", "20180731", "20180723", "20180731"],
"identity": [None, "A123456789", None, None, None],
"hid": [12345, 12345, 12345, 54321, 54321],
"hospital": ["A", "A", "A", "B", "B"],
"result": [70, None, 100, 90, 78]
})
Because the first three rows have the same hid
and hospital
, the values in identity
should also be identical.因为前三行的hid
和hospital
相同,所以identity
的值也应该相同。 As for the other two rows, they have the same hid
and hospital
as well, but no known identity
was provided, so the values in identity
should remain missing.至于其他两行,它们具有相同的hid
和hospital
为好,但没有已知的identity
被提供,所以在价值identity
应该仍然下落不明。 In other words, the desired output is:换句话说,所需的输出是:
date identity hid hospital result
0 20180724 A123456789 12345 A 70.0
1 20180725 A123456789 12345 A NaN
2 20180731 A123456789 12345 A 100.0
3 20180723 None 54321 B 90.0
4 20180731 None 54321 B 78.0
I can loop through all combinations of hid
s and hospital
s like for hid, hospital in df[["hid", "hospital"]].drop_duplicates().itertuples(index=False)
, but I don't know how to do next.我可以循环遍历hid
和hospital
的所有组合,如for hid, hospital in df[["hid", "hospital"]].drop_duplicates().itertuples(index=False)
,但我不知道如何接下来做。
Use groupby
and apply
in combination with ffill
and bfill
:使用groupby
和apply
与组合ffill
和bfill
:
df['identity'] = df.groupby(['hid', 'hospital'])['identity'].apply(lambda x: x.ffill().bfill())
This will fill NaNs forward and backwards while separating the values for the specified groups.这将在分隔指定组的值的同时向前和向后填充 NaN。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.