[英]How to create a new column based on row value in previous row in Pandas dataframe?
[英]How to increment a column based off the value in the previous row while using groupby in Pandas dataframe?
我有以下数据框:
claim diagnosis sequence
100 1 1.0
100 2 1.0
100 3 NaN
100 4 NaN
105 1 1.0
105 2 2.0
105 3 2.0
105 4 NaN
111 1 1.0
111 2 2.0
111 3 3.0
111 4 NaN
我需要的是通过声明将所有 NaN 替换为前一行中的 oneup 值:
claim diagnosis sequence
100 1 1.0
100 2 1.0
100 3 2.0
100 4 3.0
105 1 1.0
105 2 2.0
105 3 2.0
105 4 3.0
111 1 1.0
111 2 2.0
111 3 3.0
111 4 4.0
我试过 cumcount,但似乎无法让它使用以前的值。 我也试过 loc,但还不太了解它。
things = [{'claim':100, 'diagnosis':1, 'sequence':1},
{'claim':100, 'diagnosis':2, 'sequence':1 },
{'claim':100, 'diagnosis':3, },
{'claim':100, 'diagnosis':4, },
{'claim':105, 'diagnosis':1, 'sequence':1},
{'claim':105, 'diagnosis':2, 'sequence':2},
{'claim':105, 'diagnosis':3,'sequence':2 },
{'claim':105, 'diagnosis':4, },
{'claim':111, 'diagnosis':1, 'sequence':1},
{'claim':111, 'diagnosis':2, 'sequence':2},
{'claim':111, 'diagnosis':3, 'sequence':3},
{'claim':111, 'diagnosis':4, }]
df = pd.DataFrame(things)
df
几天来我一直在绞尽脑汁,任何帮助都会很棒。
使用该行之前有多少NaN
cumsum
计数,然后与ffill
s1=df['sequence'].isnull().groupby(df['claim']).cumsum()
df['sequence']=s1+df.groupby('claim')['sequence'].ffill()
df
Out[145]:
claim diagnosis sequence
0 100 1 1.0
1 100 2 1.0
2 100 3 2.0
3 100 4 3.0
4 105 1 1.0
5 105 2 2.0
6 105 3 2.0
7 105 4 3.0
8 111 1 1.0
9 111 2 2.0
10 111 3 3.0
11 111 4 4.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.