简体   繁体   English

索引中按日期排列的多个箱线图

[英]multiple boxplots by date in index

My dataframe 我的数据框

index   Dates        Hours_played
0       2014-11-06   11
1       2014-12-06   4
2       2015-09-06   5
3       2015-97-06   5

Then, I set Dates as index: 然后,我将Dates设置为索引:

             Hours_played
Dates        
2014-11-06   11
2014-12-06   4
2015-09-06   5
2015-97-06   5

The Problem: When I tried to create one box plot for each year found in index, I got both plots on the same grid. 问题:当我尝试为索引中的每一年创建一个箱形图时,我将两个图都放在同一网格上。

df.loc['2014']['Hours_played'].plot.box(ylim=(0,200))
df.loc['2015']['Hours_played'].plot.box(ylim=(0,200))

箱形图

I tried the following but the plot comes up empty: 我尝试了以下方法,但情节变成空白:

data_2015 = df.loc['2015']['Hours_played']
data_2016 = df.loc['2016']['Hours_played']
data_to_plot = [data_2015, data_2016]

mpl_fig = plt.figure()
ax = mpl_fig.add_subplot(111)
ax.boxplot(data_to_plot)
ax.set_ylim(0,300)

boxplot2

Is it possible to have them in the same grid, one by the other? 是否可以将它们一个个放在另一个网格中?

A simple solution will be grouping by year first and then making boxplot: 一个简单的解决方案是先按年份分组,然后进行箱线图绘制:

import io

import matplotlib.pyplot as plt
import pandas as pd

# Re-create your sample data
s = """Dates,Hours_played
2014-11-06,11
2014-12-06,4
2015-09-06,5
2015-07-06,5"""
df = pd.read_table(io.StringIO(s), sep=',', index_col=0, parse_dates=True)

# The following codes are the answer relevant to your question.
df.groupby(df.index.year).boxplot()
plt.show()

在此处输入图片说明

Your second method ends up with an empty plot because matplotlib fail to recognize pandas.DataFrame correctly. 第二种方法以空图结束,因为matplotlib无法正确识别pandas.DataFrame Try use Numpy-array representation: 尝试使用Numpy数组表示形式:

import io

import matplotlib.pyplot as plt
import pandas as pd

# Re-create your sample data
s = """Dates,Hours_played
2014-11-06,11
2014-12-06,4
2015-09-06,5
2015-07-06,5"""
df = pd.read_table(io.StringIO(s), sep=',', index_col=0, parse_dates=True)

# The following codes are the answer relevant to your question.    
data_2014 = df[df.index.year == 2014].as_matrix()
data_2015 = df[df.index.year == 2015].as_matrix()
data_to_plot = [data_2014, data_2015]

mpl_fig = plt.figure()
ax = mpl_fig.add_subplot(111)
ax.boxplot(data_to_plot)

plt.show()

在此处输入图片说明

To use subplots, you will need to plot them one by one: 要使用子图,您将需要一张一张地绘制它们:

import io

import matplotlib.pyplot as plt
import pandas as pd

# Re-create your sample data
s = """Dates,Hours_played
2014-11-06,11
2014-12-06,4
2015-09-06,5
2015-07-06,5"""
df = pd.read_table(io.StringIO(s), sep=',', parse_dates=[0])
df['Year'] = df.Dates.dt.year
df.set_index(['Year', 'Dates'], inplace=True)

# The following codes are the answer relevant to your question.
mpl_fig = plt.figure()
ax1 = mpl_fig.add_subplot(121)
ax1.boxplot(df.loc[2014]['Hours_played'], labels=[2014])
ax2 = mpl_fig.add_subplot(122)
ax2.boxplot(df.loc[2015]['Hours_played'], labels=[2015])

plt.show()

在此处输入图片说明

Let's reshape the data with Years in columns and boxplot : 让我们使用Years在column和boxplot重塑数据:

df.set_index(['Dates',df.Dates.dt.year])['Hours_played'].unstack().boxplot()

在此处输入图片说明

If you want to put all the boxes in the same plot, you can do something like this: 如果要将所有方框放在同一图中,则可以执行以下操作:

import matplotlib.pyplot as plt

def setBoxColors(bp, num_plots):
    color = ['red', 'blue', 'green']
    for idx in range(num_plots):
        plt.setp(bp['boxes'][idx],        color=color[idx])
        plt.setp(bp['caps'][2*idx],       color=color[idx])
        plt.setp(bp['caps'][2*idx+1],     color=color[idx])
        plt.setp(bp['whiskers'][2*idx],   color=color[idx])
        plt.setp(bp['whiskers'][2*idx+1], color=color[idx])
        plt.setp(bp['fliers'][2*idx],     color=color[idx])
        plt.setp(bp['fliers'][2*idx+1],   color=color[idx])
        plt.setp(bp['medians'][idx],      color=color[idx])

# Some fake data to plot
A = [[1, 2, 5,]]
B = [[3, 4, 5]]
C = [[1, 7, 10]]

fig = plt.figure()
ax = plt.axes()
plt.hold(True)

bp = plt.boxplot(A, positions = [2], widths = 0.6, patch_artist=True)
setBoxColors(bp, 1)

bp = plt.boxplot(B, positions = [6], widths = 0.6, patch_artist=True)
setBoxColors(bp, 1)

bp = plt.boxplot(C, positions = [10], widths = 0.6, patch_artist=True)
setBoxColors(bp, 1)

# set axes limits and labels
plt.xlim(0,12)
plt.ylim(0,12)
ax.set_xticklabels(['A', 'B', 'C'])
ax.set_xticks([2, 6, 10])

# draw temporary legend
hB, = plt.plot([1,1],'r-')
plt.legend((hB, ),('Type1', ))
hB.set_visible(False)

plt.show()

With the help of Scott Boston, Y. Luo, and yuhow5566, I was able to devise an interesting answer. 在斯科特·波士顿,Y。Luo和yuhow5566的帮助下,我提出了一个有趣的答案。 From Scott, I learned that it's better not to index the Dates (keep them a regular column) for this type of boxplot; 从Scott那里,我了解到最好不要为这种箱形图索引日期(将它们保留在常规列中)。 and from Y. Luo, I learned how to create a new column, while isolating the year from a datetime value. 并且从Y. Luo,我学会了如何创建新列,同时将日期和日期时间值隔离。

df['Year'] = s['Dates'].dt.year

df.boxplot(column='Hours_played', by='Year', figsize=(9,9))

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM