[英]How can I fill in missing values in pandas dataframe using conditions on the data?
[英]How can I fill missing data sequentially in pandas?
我有一个数据如下。 如图所示,我需要为每个组估算周数。 我知道每个组的开始周数和一年中的周数。 我试过使用ffill()
function 但这在这种情况下不起作用。 有没有内置的 function 或有效的方法?
group year week week_imputed
A 2016 43 43
A 2016 44 44
A 2016 NaN 45
A 2016 NaN 46
A 2016 NaN 47
A 2016 48 48
A 2016 49 49
A 2016 50 50
A 2016 51 51
A 2016 52 52
A 2016 NaN 53
A 2017 NaN 1
A 2017 NaN 2
A 2017 NaN 3
A 2017 NaN 4
A 2017 5 5
A 2017 NaN 6
A 2017 7 7
A 2017 NaN 8
B 2016 47 47
B 2016 NaN 48
B 2016 NaN 49
B 2016 50 50
B 2016 51 51
B 2016 NaN 52
B 2017 NaN 1
B 2017 2 2
df['week_imputed'] = df.groupby([df.group, df.year]).week.fillna(method='ffill').fillna(value=1).astype(int) + \
df.groupby([df.group, df.year, df.week.notnull().cumsum()]).cumcount()
与 Pandas 类似的答案fillna 具有递增的值,除了用最初我使用ffill
的值填充NaN
,然后 go 回到那些以NaN
开头并将它们初始化为 1 的组,因为这是您想要的起始值。
output:
group year week week_imputed
0 A 2016 43.0 43
1 A 2016 44.0 44
2 A 2016 NaN 45
3 A 2016 NaN 46
4 A 2016 NaN 47
5 A 2016 48.0 48
6 A 2016 49.0 49
7 A 2016 50.0 50
8 A 2016 51.0 51
9 A 2016 52.0 52
10 A 2016 NaN 53
11 A 2017 NaN 1
12 A 2017 NaN 2
13 A 2017 NaN 3
14 A 2017 NaN 4
15 A 2017 5.0 5
16 A 2017 NaN 6
17 A 2017 7.0 7
18 A 2017 NaN 8
19 B 2016 47.0 47
20 B 2016 NaN 48
21 B 2016 NaN 49
22 B 2016 50.0 50
23 B 2016 51.0 51
24 B 2016 NaN 52
25 B 2017 NaN 1
26 B 2017 2.0 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.