[英]Using Pandas pd.pivot_table to pivot by date
I'm still very new to pandas and python, and I'm afraid I'm doing something foolish here. 我对熊猫和python还是很陌生,恐怕我在这里做一些愚蠢的事情。 That said, the closest thing I could find to the problem I'm encountering is here How to create pivot with totals (margins) in Pandas?
就是说,我能找到的最接近我所遇到的问题的地方是如何在Pandas中使用总计(边距)创建枢轴? , so I am asking.
,所以我问。
I've got a simple dataframe with 3 columns. 我有一个包含3列的简单数据框。
Account ID Amount Close Date
0 10a 100 2009-01-01
1 10a 50 2009-01-01
2 10a 100 2010-04-01
3 10a 100 2011-04-01
4 10a 100 2012-05-01
.. ... ... ...
35 4b .5 2009-01-01
36 4c .5 2009-01-01
37 5a .5 2009-01-01
38 5b .5 2009-01-01
39 8a .5 2009-01-01
I think I'm having trouble with the close date column. 我想我在截止日期栏上遇到了麻烦。 I suspect that somehow pandas doesn't realize that 2009-01-01 equals another 2009-01-01.
我怀疑大熊猫没有意识到2009-01-01等于另一个2009-01-01。
I'd like to pivot this table to get output like this, where I can see things grouped first by account id and then by close date. 我想透视此表以获取这样的输出,在这里我可以看到按帐户ID然后按截止日期分组的内容。 If an account id has multiple rows with the same close date, I'd like those amounts to be added up in the values column, like this.
如果一个帐户ID有多行具有相同的关闭日期,那么我希望将这些金额加到“值”列中,如下所示。 (For the record, I'm really only interested in the year, but in trouble shooting I've been trying to simplify as much as possible.)
(根据记录,我真的只对这一年感兴趣,但是为了排除故障,我一直在尝试尽可能简化。)
Account ID Close Date
2c 2009-01-01 100
2011-01-01 100
10a 2009-01-01 150
2010-04-01 100
...
I've tried a variety of things, and keep running into problems that make me thing I've got some kind of a date problem. 我已经尝试了各种方法,并不断遇到使我感到有些约会问题的问题。 Maybe I need to import a different library?
也许我需要导入其他库?
Here's my latest attempt: 这是我最近的尝试:
pd.pivot_table(opps, index=['Account ID'], columns = 'Close Date', values=['Amount'], aggfunc=np.su
m) pd.pivot_table(opps, index=['Account ID'], columns = 'Close Date', values=['Amount'], aggfunc=np.su
m)
and the output is very close to what I want. 输出非常接近我想要的
The only problem is that for any account id that has two rows for a date, that data just disappears in the output. 唯一的问题是,对于日期有两行的任何帐户ID,该数据只会在输出中消失。 Account 10a has 3 rows for 2009-01-01, but in the pivot table shows 2009-01-01 Nan.
帐户10a在2009-01-01中有3行,但数据透视表中显示的是2009-01-01 Nan。
I thought I'd try the same pivot table with margins = True. 我以为我会尝试使用margins = True的相同数据透视表。
When I did that, I got an error message. 当我这样做时,我收到一条错误消息。
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-182-f8dc0d75c868> in <module>()
3 margins = "True",
4 values=['Amount'],
----> 5 aggfunc=np.sum)
/Applications/anaconda/lib/python2.7/site-packages/pandas/tools/pivot.pyc in pivot_table(data, values, index, columns, aggfunc, fill_value, margins, dropna)
141 if margins:
142 table = _add_margins(table, data, values, rows=index,
--> 143 cols=columns, aggfunc=aggfunc)
144
145 # discard the top level
/Applications/anaconda/lib/python2.7/site-packages/pandas/tools/pivot.pyc in _add_margins(table, data, values, rows, cols, aggfunc)
167
168 if values:
--> 169 marginal_result_set = _generate_marginal_results(table, data, values, rows, cols, aggfunc, grand_margin)
170 if not isinstance(marginal_result_set, tuple):
171 return marginal_result_set
/Applications/anaconda/lib/python2.7/site-packages/pandas/tools/pivot.pyc in _generate_marginal_results(table, data, values, rows, cols, aggfunc, grand_margin)
236 # we are going to mutate this, so need to copy!
237 piece = piece.copy()
--> 238 piece[all_key] = margin[key]
239
240 table_pieces.append(piece)
/Applications/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in __getitem__(self, key)
1795 return self._getitem_multilevel(key)
1796 else:
-> 1797 return self._getitem_column(key)
1798
1799 def _getitem_column(self, key):
/Applications/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in _getitem_column(self, key)
1802 # get column
1803 if self.columns.is_unique:
-> 1804 return self._get_item_cache(key)
1805
1806 # duplicate columns & possible reduce dimensionaility
/Applications/anaconda/lib/python2.7/site-packages/pandas/core/generic.pyc in _get_item_cache(self, item)
1082 res = cache.get(item)
1083 if res is None:
-> 1084 values = self._data.get(item)
1085 res = self._box_item_values(item, values)
1086 cache[item] = res
/Applications/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in get(self, item, fastpath)
2849
2850 if not isnull(item):
-> 2851 loc = self.items.get_loc(item)
2852 else:
2853 indexer = np.arange(len(self.items))[isnull(self.items)]
/Applications/anaconda/lib/python2.7/site-packages/pandas/core/index.pyc in get_loc(self, key, method)
1570 """
1571 if method is None:
-> 1572 return self._engine.get_loc(_values_from_object(key))
1573
1574 indexer = self.get_indexer([key], method=method)
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3824)()
pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3704)()
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12280)()
pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12231)()
KeyError: Timestamp('2009-01-01 00:00:00')
Thanks for any advice you can offer. 感谢您提供的任何建议。
It sounds like a group by rather than a pivot table to me - your columns are fixed. 在我看来,这听起来像是一个分组依据,而不是数据透视表-您的列是固定的。
For ex.: 例如:
import pandas as pd
from datetime import date
df = pd.DataFrame(data=[['10a', 100, date(2009, 1, 1)],
['10a', 50, date(2009, 1, 1)],
['10a', 100, date(2010, 4, 1)],
['10a', 100, date(2011, 4, 1)],
['10a', 100, date(2012, 5, 1)],
['4b', .5, date(2009, 1, 1)],
['4c', .5, date(2009, 1, 1)],
['5a', .5, date(2009, 1, 1)],
['5b', .5, date(2009, 1, 1)],
['8a', .5, date(2009, 1, 1)]],
columns=['Account ID', 'Amount', 'Close Date'])
df.groupby(['Account ID', 'Close Date']).sum()
gives: 得到:
Amount
Account ID Close Date
10a 2009-01-01 150.0
2010-04-01 100.0
2011-04-01 100.0
2012-05-01 100.0
4b 2009-01-01 0.5
4c 2009-01-01 0.5
5a 2009-01-01 0.5
5b 2009-01-01 0.5
8a 2009-01-01 0.5
Apologies if I've missed something. 抱歉,如果我错过了什么。
The equivalent with pivot table is: 与数据透视表等效的是:
df.pivot_table(index=['Account ID', 'Close Date'], values=['Amount'], aggfunc=np.sum)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.