简体   繁体   English

从熊猫交叉表制作气泡图

[英]Making a bubble chart from a pandas crosstab

I have a pandas dataframe with 4 columns and a few thousand rows. 我有一个4列和几千行的pandas数据框。 All entries are either True or False. 所有条目均为True或False。 Let's call the dataframe 'df' and the columns 'c0', 'c1', 'c2', and 'c3'. 让我们将数据帧称为“ df”,并将列称为“ c0”,“ c1”,“ c2”和“ c3”。 I'm interested in how many rows have each of the 2^4=16 possible truth values, so I make myself a cross-tabulation: 我对2 ^ 4 = 16个可能的真值中的每一个有多少行感兴趣,因此我将自己设为交叉表:

xt = pd.crosstab([df.c0,df.c1],[df.c2,df.c3])
print(xt)

That displays a nice 4x4 table of cells, with each cell containing the count of rows which have that combination of truth values. 这将显示一个漂亮的4x4单元格表,每个单元格包含具有真值组合的行数。 Even better, the spatial layout of those 16 cells is meaningful and useful to me. 更好的是,这16个单元的空间布局对我来说是有意义的且有用的。 OK, all's well. 好的,一切都很好。 But how do I plot it? 但是如何绘制呢?

Specifically, I'd like to make a bubble chart of those crosstab counts, ie a graphical representation of the crosstab data in the same spatial arrangement as was shown in the table, but now replace each number with a colored blob (say, a circle) of area proportional to the count. 具体来说,我想为这些交叉表计数制作一个气泡图, 以与表中所示相同的空间排列方式以图形方式表示交叉表数据,但现在将每个数字替换为彩色的斑点(例如,一个圆圈)。 )的面积与计数成正比。 So, that's a scatter plot with the four (c0,c1) truth values along one axis, the four (c2,c3) truth values along the other axis, and a 4x4 regular grid of variously sized circles. 因此,这是一个散点图,沿着一个轴具有四个(c0,c1)真值,沿着另一个轴具有四个(c2,c3)真值,并且具有各种大小的圆的4x4规则网格。

I know that I can make a bubble chart by passing size data to the 's' keyword of matplotlib's scatter function, but I can't figure out a simple way of telling pandas to make a scatter plot which uses column headings as x-coordinates, row headings as y-coordinates, and data values as bubble sizes for a scatter plot. 我知道我可以通过将大小数据传递给matplotlib散点函数的's'关键字来制作气泡图,但是我无法弄清楚告诉熊猫制作散点图的简单方法,该方法使用列标题作为x坐标,行标题作为y坐标,数据值作为散点图的气泡大小。 I've had some luck by converting my dataframe to a numpy array and plotting that, but then I lose the structure of the axis labels from the crosstab. 通过将数据框转换为numpy数组并进行绘制,我有些运气,但随后我失去了交叉表中轴标签的结构。 (Yes, I could just rebuild the tick labels by hand, but I'd like to be able to reproduce this task algorithmically for other similar data sets.) (是的,我可以手动重建刻度标签,但是我希望能够通过算法为其他类似数据集重现此任务。)

EDIT: Inspired by the answer from @piRSquared below, here's some clarification of what I'm asking for. 编辑:受以下@piRSquared的回答启发,以下是我所要求的说明。 This code comes close to what I want, but the axes on the resulting plot have lost any information about the layered MultiIndex label structure of the crosstab object. 这段代码接近我想要的代码,但是结果绘图上的轴丢失了有关交叉表对象的分层MultiIndex标签结构的任何信息。

import pandas as pd
import numpy as np

randomData=np.random.choice([True,False],size=(100, 4),p=[.6,.4])
df = pd.DataFrame(randomData, columns=['c0','c1','c2','c3'])
xt=pd.crosstab([df.c0,df.c1], [df.c2,df.c3])

x=np.array([range(4)]*4)
y=x.transpose()[::-1,:]
pl.scatter(x,y,s=np.array(xt)*10)

(link to plot image, since I don't have enough reputation to embed: a scatter plot with poorly labelled axes .) Ideally the axes labels would have a visually leveled structure derived from underlying MultiIndex of the crosstab object, kind of like this: (链接到绘图图像,因为我没有足够的声誉来嵌入: 散布的坐标轴标记不佳 。)理想情况下,坐标轴标签应具有从交叉表对象的基础MultiIndex派生的视觉水平结构,如下所示:

c2          False       True       
c3          False True  False True 
c0    c1                           
False False     0     8     4     9
      True      3     2     4    10
True  False     7     5     3    10
      True      2     7     8    18

Or, perhaps, something reminiscent of what the legend and x-axis are expressing here: 或者,也许让人想起图例和x轴在这里表达的内容:

xt.plot(kind='bar',stacked=True)

(Another plot image link: a stack plot that knows about the multiindex nature of its underlying dataframe .) (另一个绘图图像链接: 一个堆栈绘图,它了解其基础数据帧的多索引性质 。)

Hope this helps 希望这可以帮助

df = pd.DataFrame(
    np.random.choice(
        np.arange(6),
        size=(100, 2),
        p=np.arange(1, 7) / 21
    ), columns=list('AB')
)

c = pd.crosstab(df.A, df.B).stack().reset_index(name='C')

c.plot.scatter('A', 'B', s=c.C * 10)

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM