[英]fill value in a pandas groupby object after filling missing date
已經問了許多類似的類似問題,這對我有很大幫助,我遵循了以下幫助: 填寫 groupby 和 Pandas 的缺失日期-在保留列/索引值的同時將缺失的日期添加到 DataFrame?
但是它仍然沒有成功。
我制作了一個玩具數據集來演示我面臨的問題:
data = pd.DataFrame({'Date': ['2012-01-01', '2012-01-01','2012-01-01','2012-01-02','2012-01-02','2012-01-02','2012-01-03'], 'Id': ['21','21','22','21','22','23','21'], 'Quantity': ['5','1','4','4','2','1','4'], 'NetAmount': ['66','45','76','35','76','73','45']})
data['Quantity'] = data['Quantity'].astype('int')
data['NetAmount'] = data['NetAmount'].astype('float')
我對數據集進行了分組,如下面的代碼所示:
data['Date'] =pd.to_datetime(data.Date) - pd.to_timedelta(7,unit = 'd')
data =data.groupby(['Id',pd.Grouper(key='Date', freq='W-MON')])['Quantity', 'NetAmount'].sum().reset_index().sort_values('Date')
data.reset_index()
data1 = data.groupby(['Id','Date']).agg({'Quantity': sum, 'NetAmount': sum}).reset_index()
然后我填寫缺失的日期:
data2 = data1.set_index(['Date', 'Id','NetAmount']).Quantity.unstack(-3).\
reindex(columns=pd.date_range(data1['Date'].min(), data1['Date'].max(),freq='W-MON'),fill_value=0).\
stack(dropna=False).unstack().stack(dropna=False).\
unstack('NetAmount').stack(dropna=False).fillna(0).reset_index()
給出結果 dataframe:
Id level_1 NetAmount 0
0 21 2011-12-26 45.0 0.0
1 21 2011-12-26 73.0 0.0
2 21 2011-12-26 146.0 10.0
3 21 2011-12-26 152.0 0.0
4 21 2012-01-02 45.0 4.0
5 21 2012-01-02 73.0 0.0
6 21 2012-01-02 146.0 0.0
7 21 2012-01-02 152.0 0.0
8 22 2011-12-26 45.0 0.0
9 22 2011-12-26 73.0 0.0
10 22 2011-12-26 146.0 0.0
11 22 2011-12-26 152.0 6.0
12 22 2012-01-02 45.0 0.0
13 22 2012-01-02 73.0 0.0
14 22 2012-01-02 146.0 0.0
15 22 2012-01-02 152.0 0.0
16 23 2011-12-26 45.0 0.0
17 23 2011-12-26 73.0 1.0
18 23 2011-12-26 146.0 0.0
19 23 2011-12-26 152.0 0.0
20 23 2012-01-02 45.0 0.0
21 23 2012-01-02 73.0 0.0
22 23 2012-01-02 146.0 0.0
23 23 2012-01-02 152.0 0.0
但實際上我期望得到:
0 21 2011-12-26 66.0 5.0
1 21 2011-12-26 45.0 1.0
2 21 2011-12-26 35.0 4.0
3 21 2012-02-02 45.0 4.0
4 22 2011-12-26 76.0 4.0
5 22 2012-02-02 76.0 2.0
6 23 2011-12-26 0.0 0.0
7 23 2012-02-02 73.0 1.0
填充工作,但是,我不明白結果 dataframe 中到底發生了什么,例如 netAmount 列中的實例,結果關閉我是 unstack/stack function 的新手,我在這個過程中遺漏了什么嗎? 感謝您的任何幫助!
更新:添加“0”值后,我嘗試按 id 和數據重新分組:
data2 = pd.DataFrame(data2)
data3 = data2.groupby(['Id','Date']).agg({'Quantity': sum, 'NetAmount': sum}).reset_index()
但我得到這個錯誤
Traceback (most recent call last):
File "", line 48, in <module>
data3 = data2.groupby(['Id','Date']).agg({'Quantity': sum, 'NetAmount': sum}).reset_index()
File "", line 7632, in groupby
observed=observed, **kwargs)
File "", line 2110, in groupby
return klass(obj, by, **kwds)
File "", line 360, in __init__
mutated=self.mutated)
File "", line 578, in _get_grouper
raise KeyError(gpr)
KeyError: 'Date'
您需要將Quantity
和NetAmount
列轉換為數字
data['Quantity'] = data['Quantity'].astype('int')
data['NetAmount'] = data['NetAmount'].astype('float')
當列是字符串時,總和 function 按組連接所有字符串。
現在重新運行您的代碼,它應該可以按預期工作
# Id level_1 NetAmount 0
#0 21 2011-12-26 45.0 0.0
#1 21 2011-12-26 73.0 0.0
#2 21 2011-12-26 146.0 10.0
#3 21 2011-12-26 152.0 0.0
#4 21 2012-01-02 45.0 4.0
#5 21 2012-01-02 73.0 0.0
#6 21 2012-01-02 146.0 0.0
#7 21 2012-01-02 152.0 0.0
#8 22 2011-12-26 45.0 0.0
#9 22 2011-12-26 73.0 0.0
#10 22 2011-12-26 146.0 0.0
#11 22 2011-12-26 152.0 6.0
#12 22 2012-01-02 45.0 0.0
#13 22 2012-01-02 73.0 0.0
#14 22 2012-01-02 146.0 0.0
#15 22 2012-01-02 152.0 0.0
#16 23 2011-12-26 45.0 0.0
#17 23 2011-12-26 73.0 1.0
#18 23 2011-12-26 146.0 0.0
#19 23 2011-12-26 152.0 0.0
#20 23 2012-01-02 45.0 0.0
#21 23 2012-01-02 73.0 0.0
#22 23 2012-01-02 146.0 0.0
#23 23 2012-01-02 152.0 0.0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.