[英]Create a dataframe from the grouped columns of a groupedby one
我有一個 groupedby 數據框,並希望為回歸做准備。
>>>ts = sales.groupby(['date_block_num','shop_id'])['item_cnt_day'].sum()
date_block_num shop_id
0 0 5578.0
1 2947.0
2 1146.0
3 767.0
4 2114.0
...
33 1 1972.0
2 1263.0
3 2316.0
4 1446.0
5 790.0
我想得到:
shop_id 1 ... 33
0 5578.0 1972.0
1 2947.0 1263.0
2 1146.0 2316.0
3 767.0 1446.0
4 2114.0 790.0
能夠創建如下所示的點雲
直到今天我試過:
pd.DataFrame(columns=ts.date_block_num, index= ts.shop_id, data=ts)
但它返回:
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-30-82ae77e8cda8> in <module>()
----> 1 pd.DataFrame(columns=ts.date_block_num, index= ts.shop_id, data=ts)
/usr/local/lib/python3.6/dist-packages/pandas/core/generic.py in __getattr__(self, name)
5137 if self._info_axis._can_hold_identifiers_and_holds_name(name):
5138 return self[name]
-> 5139 return object.__getattribute__(self, name)
5140
5141 def __setattr__(self, name: str, value) -> None:
AttributeError: 'Series' object has no attribute 'date_block_num'
原始數據是這樣的:
>>> sales
date date_block_num shop_id item_id item_price item_cnt_day
0 2013-01-02 0 59 22154 999.00 1.0
1 2013-01-03 0 25 2552 899.00 1.0
2 2013-01-05 0 25 2552 899.00 -1.0
3 2013-01-06 0 25 2554 1709.05 1.0
4 2013-01-15 0 25 2555 1099.00 1.0
...
我認為您正在尋找unstack
:
out = (sales.groupby(['date_block_num','shop_id'])
['item_cnt_day'].sum()
.unstack('date_block_num')
)
你也可以這樣做:
out = sales.pivot_table(index='shop_id',
columns='dat_block_num',
values='item_cnt_day',
aggfunc='sum')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.