简体   繁体   English

Pandas groupby两列并创建一个计数总计的图

[英]Pandas groupby two columns and create a plot of count totals

I'm new to Pandas and I'm looking for a way to plot data that has been grouped by two columns. 我是Pandas的新手,我正在寻找一种方法来绘制按两列分组的数据。 Here's my example: 这是我的例子:

First I group by the 'Date'(year) and 'Primary Type' column. 首先,我按“日期”(年份)和“主要类型”列进行分组。

groups = df.groupby([df['Date'].map(lambda x: x.year), pri_type['Primary Type']])

Now from that I can get a series of basically exactly what I want to plot. 从那以后,我可以获得一系列基本上我想要绘制的内容。

groups.size().head()

Date  Primary Type        
2001  ARSON                   1010
      ASSAULT                31384
      BATTERY                93448
      BURGLARY               26011
      CRIM SEXUAL ASSAULT     1794 
dtype: int64

But when I plot this I get a very messy plot with thousands of labels on the x axis. 但是当我绘制这个时,我会得到一个非常混乱的情节,在x轴上有数千个标签。 What I would like to get is a plot with date on the x axis and a ledgend with all the Primary Types. 我想得到的是x轴上有日期的图和带有所有主要类型的ledgend。 Something similar to this graph: 与此图类似的东西:

示例图

Thanks in advance! 提前致谢!

What do you want to be displayed on the x axis, date? 你希望在x轴上显示什么,日期? If so, you can set date as index: groups.set_index('Date') 如果是这样,您可以将日期设置为索引:groups.set_index('Date')

The solution that I came up with is to convert the series to a data frame and use the unstack() method. 我想出的解决方案是将系列转换为数据框并使用unstack()方法。 Here is what I did: 这是我做的:

# convert to a dataframe
df = groups.size().to_frame()

|       |               |  0
|------ | --------------|------
|Date   | Primary Type  |
|       | ARSON         | 1010
|       | ASSAULT       | 31384
| 2001  | BATTERY       | 93234
|       | BURGLARY      | 26031
|       | CRIM SEXUAL AS| 1723

# unstack() to pivot the data which puts it in the correct format for plot()
df.unstack(level=-1)

|            |0                    
|------------|-------|---------|-------...
|Primary Type|ARSON  |ASSAULT  |BATTERY...
|Date        |       |         |       ...
|2001        |1010.0 |31384.0  |93234.0...
|2002        |2938.0 |31993.0  |94235.0...
|2003        |955.0  |30082.0  |92834.0...

Which almost makes the graph I was after, other than the 0, but I can probably get rid of that. 除了0之外,这几乎是我所追求的图形,但我可以摆脱它。 And as you can see it's still not very readable, but this solves my question of how to graph it. 正如你所看到它仍然不是很易读,但这解决了我如何绘制图形的问题。

df.unstack(level=-1).plot(kind='bar', figsize = (10,10))

Final graph 最终图表

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM