简体   繁体   English

pandas.crosstab中缺少数据

[英]Missing data in pandas.crosstab

I'm making some crosstabs with pandas: 我正在制作一些带有熊猫的交叉表:

a = np.array(['foo', 'foo', 'foo', 'bar', 'bar', 'foo', 'foo'], dtype=object)
b = np.array(['one', 'one', 'two', 'one', 'two', 'two', 'two'], dtype=object)
c = np.array(['dull', 'dull', 'dull', 'dull', 'dull', 'shiny', 'shiny'], dtype=object)

pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])

b     one   two       
c    dull  dull  shiny
a                     
bar     1     1      0
foo     2     1      2

But what I actually want is the following: 但我真正想要的是以下内容:

b     one        two       
c    dull  shiny dull  shiny
a                     
bar     1     0    1      0
foo     2     0    1      2

I found workaround by adding new column and set levels as new MultiIndex, but it seems to be difficult... 我通过添加新列和设置级别作为新的MultiIndex找到了解决方法,但似乎很难......

Is there any way to pass MultiIndex to crosstabs function to predefine output columns? 有没有办法将MultiIndex传递给交叉表函数来预定义输出列?

The crosstab function has a parameter called dropna which is set to True by default. 交叉表函数有一个名为dropna的参数,默认情况下设置为True。 This parameter defines whether empty columns (such as the one-shiny column) should be displayed or not. 此参数定义是否应显示空列(例如一个闪亮列)。

I tried calling the funcion like this: 我试着像这样调用这个函数:

pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'], dropna = False)

and this is what I got: 这就是我得到的:

b     one          two       
c    dull  shiny  dull  shiny
a                            
bar     1      0     1      0
foo     2      0     1      2

Hope that was still helpful. 希望这仍然有用。

I don't think there is a way to do this, and crosstab calls pivot_table in the source, which doesn't seem to offer this either. 我认为没有办法做到这一点,并且crosstab调用pivot_table中的pivot_table ,这似乎也没有提供。 I raised it as an issue here . 我把它作为一个问题提出来了

A hacky workaround (which may or may not be the same as you were already using...): 一个hacky解决方法(可能与您已经使用的相同或不同......):

from itertools import product
ct = pd.crosstab(a, [b, c], rownames=['a'], colnames=['b', 'c'])
a_x_b = list(product(np.unique(b), np.unique(c)))
a_x_b = pd.MultiIndex.from_tuples(a_x_b)

In [15]: ct.reindex_axis(a_x_b, axis=1).fillna(0)
Out[15]:
      one          two
     dull  shiny  dull  shiny
a
bar     1      0     1      0
foo     2      0     1      2

If product is too slow, here is a numpy implementation of it. 如果product太慢,这里是一个简单的实现

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM