[英]Fill rows of dataframe based on condition of other row
I have a dataframe like this:我有一个像这样的 dataframe:
pd.DataFrame({"ID1": ["A", "B", "C", "A", "C", "C", "A"],
"ID2": ["a", "b", "c", "a", "e", "c", "b"],
"Month": [1, 4, 7, 4, 2, 9, 3],
"Value": [10, 20, 40, 60, 20, 30, 10]})
ID1 ID2 Month Value
A a 1 10
B b 4 20
C c 7 40
A a 4 60
C e 2 20
C c 9 30
A b 3 10
I want to to fill the values for the missing months by the values of the preceding month of the "ID1"+"ID2"-combination, ie: there is no value for the month 2 and 3 of the combination "A"+"a", so it should take the value of the month 1. At month 4 we have a value for "A"+"a", so this value should be taken till there is another value for a month.我想用“ID1”+“ID2”组合的前一个月的值填充缺失月份的值,即:组合“A”+“的第 2 个月和第 3 个月没有值” a”,所以它应该取第 1 个月的值。在第 4 个月,我们有一个值“A”+“a”,所以这个值应该取到一个月的另一个值。
For the combination "C"+"c" the values should start appear at month 7, because it is the first value that appears for the combination.对于组合“C”+“c”,值应从第 7 个月开始出现,因为它是组合出现的第一个值。
The end dataframe should look like this:末端 dataframe 应如下所示:
ID1 ID2 Month Value
A a 1 10
A a 2 10
A a 3 10
A a 4 60
A a 5 60
A a 6 60
A a 7 60
A a 8 60
A a 9 60
A a 10 60
A a 11 60
A a 12 60
B b 4 20
C c 1 0
C c 2 0
C c 3 0
C c 4 0
C c 5 0
C c 6 0
C c 7 40
C c 8 40
C c 9 30
C c 10 30
C c 11 30
C c 12 30
... ... ... ...
I started my approach kind of inefficient (I guess):我开始我的方法有点低效(我猜):
Loop over the months 1:12循环数月 1:12
Loop over the unique combinations of "ID1"+"ID2"循环遍历“ID1”+“ID2”的唯一组合
If a row for "ID1"+"ID2" and month exists如果存在“ID1”+“ID2”和月份的行
Then go to the next month然后go到下个月
Else look at the month before of the "ID1"+"ID2" combination其他看前一个月的“ID1”+“ID2”组合
If the value exists如果值存在
Then take the value然后取值
Else put the value to 0否则将值设为 0
Is there a better way to do this or maybe a package that could help me calculate this efficiently?有没有更好的方法来做到这一点,或者 package 可以帮助我有效地计算这个?
Define the following function to process each group:定义以下 function 来处理每个组:
def proc(grp):
wrk = grp.set_index('Month').Value.reindex(np.arange(1, 13).tolist())\
.ffill().fillna(0, downcast='infer')
id1, id2 = grp.iloc[0, :2].tolist()
wrk.index = pd.MultiIndex.from_product([[id1], [id2], wrk.index],
names=['ID1', 'ID2', 'Month'])
return wrk
Then, to get your expected result, group df by ID1 and ID2 and apply the above function:然后,为了获得您的预期结果,将df按ID1和ID2分组并应用上述 function:
result = df.groupby(['ID1', 'ID2'], group_keys=False).apply(proc).reset_index()
The last step is reset_index() to convert the resulting (concatenated) Series into a DataFrame.最后一步是reset_index()将生成的(连接的)系列转换为 DataFrame。
A fragment of the result for groups ('A', 'a') and ('C', 'c') is:组('A', 'a')和('C', 'c')的结果片段是:
ID1 ID2 Month Value
0 A a 1 10
1 A a 2 10
2 A a 3 10
3 A a 4 60
4 A a 5 60
5 A a 6 60
6 A a 7 60
7 A a 8 60
8 A a 9 60
9 A a 10 60
10 A a 11 60
11 A a 12 60
...
36 C c 1 0
37 C c 2 0
38 C c 3 0
39 C c 4 0
40 C c 5 0
41 C c 6 0
42 C c 7 40
43 C c 8 40
44 C c 9 30
45 C c 10 30
46 C c 11 30
47 C c 12 30
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.