pandas.crosstab中缺少数据

Question

I'm making some crosstabs with pandas: 我正在制作一些带有熊猫的交叉表：

a = np.array(['foo', 'foo', 'foo', 'bar', 'bar', 'foo', 'foo'], dtype=object)
b = np.array(['one', 'one', 'two', 'one', 'two', 'two', 'two'], dtype=object)
c = np.array(['dull', 'dull', 'dull', 'dull', 'dull', 'shiny', 'shiny'], dtype=object)

pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])

b     one   two       
c    dull  dull  shiny
a                     
bar     1     1      0
foo     2     1      2

But what I actually want is the following: 但我真正想要的是以下内容：

b     one        two       
c    dull  shiny dull  shiny
a                     
bar     1     0    1      0
foo     2     0    1      2

I found workaround by adding new column and set levels as new MultiIndex, but it seems to be difficult... 我通过添加新列和设置级别作为新的MultiIndex找到了解决方法，但似乎很难......

Is there any way to pass MultiIndex to crosstabs function to predefine output columns? 有没有办法将MultiIndex传递给交叉表函数来预定义输出列？

Answer 1

The crosstab function has a parameter called dropna which is set to True by default. 交叉表函数有一个名为dropna的参数，默认情况下设置为True。 This parameter defines whether empty columns (such as the one-shiny column) should be displayed or not. 此参数定义是否应显示空列（例如一个闪亮列）。

I tried calling the funcion like this: 我试着像这样调用这个函数：

pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'], dropna = False)

and this is what I got: 这就是我得到的：

b     one          two       
c    dull  shiny  dull  shiny
a                            
bar     1      0     1      0
foo     2      0     1      2

Hope that was still helpful. 希望这仍然有用。

Answer 2

I don't think there is a way to do this, and crosstab calls pivot_table in the source, which doesn't seem to offer this either. 我认为没有办法做到这一点，并且crosstab调用pivot_table中的pivot_table ，这似乎也没有提供。 I raised it as an issue here . 我把它作为一个问题提出来了。

A hacky workaround (which may or may not be the same as you were already using...): 一个hacky解决方法（可能与您已经使用的相同或不同......）：

from itertools import product
ct = pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])
a_x_b = list(product(np.unique(b), np.unique(c)))
a_x_b = pd.MultiIndex.from_tuples(a_x_b)

In [15]: ct.reindex_axis(a_x_b, axis=1).fillna(0)
Out[15]:
      one          two
     dull  shiny  dull  shiny
a
bar     1      0     1      0
foo     2      0     1      2

If product is too slow, here is a numpy implementation of it. 如果product太慢，这里是一个简单的实现。

pandas.crosstab中缺少数据

问题描述

2 个解决方案

解决方案1
5 2014-01-14 10:18:33

解决方案2
4 已采纳 2013-06-08 20:31:38

pandas.crosstab中缺少数据

问题描述

2 个解决方案

解决方案1 5 2014-01-14 10:18:33

解决方案2 4 已采纳 2013-06-08 20:31:38

解决方案1
5 2014-01-14 10:18:33

解决方案2
4 已采纳 2013-06-08 20:31:38