[英]multiple boxplots by date in index
My dataframe 我的数据框
index Dates Hours_played
0 2014-11-06 11
1 2014-12-06 4
2 2015-09-06 5
3 2015-97-06 5
Then, I set Dates as index: 然后,我将Dates设置为索引:
Hours_played
Dates
2014-11-06 11
2014-12-06 4
2015-09-06 5
2015-97-06 5
The Problem: When I tried to create one box plot for each year found in index, I got both plots on the same grid. 问题:当我尝试为索引中的每一年创建一个箱形图时,我将两个图都放在同一网格上。
df.loc['2014']['Hours_played'].plot.box(ylim=(0,200))
df.loc['2015']['Hours_played'].plot.box(ylim=(0,200))
I tried the following but the plot comes up empty: 我尝试了以下方法,但情节变成空白:
data_2015 = df.loc['2015']['Hours_played']
data_2016 = df.loc['2016']['Hours_played']
data_to_plot = [data_2015, data_2016]
mpl_fig = plt.figure()
ax = mpl_fig.add_subplot(111)
ax.boxplot(data_to_plot)
ax.set_ylim(0,300)
Is it possible to have them in the same grid, one by the other? 是否可以将它们一个个放在另一个网格中?
A simple solution will be grouping by year first and then making boxplot: 一个简单的解决方案是先按年份分组,然后进行箱线图绘制:
import io
import matplotlib.pyplot as plt
import pandas as pd
# Re-create your sample data
s = """Dates,Hours_played
2014-11-06,11
2014-12-06,4
2015-09-06,5
2015-07-06,5"""
df = pd.read_table(io.StringIO(s), sep=',', index_col=0, parse_dates=True)
# The following codes are the answer relevant to your question.
df.groupby(df.index.year).boxplot()
plt.show()
Your second method ends up with an empty plot because matplotlib
fail to recognize pandas.DataFrame
correctly. 第二种方法以空图结束,因为matplotlib
无法正确识别pandas.DataFrame
。 Try use Numpy-array representation: 尝试使用Numpy数组表示形式:
import io
import matplotlib.pyplot as plt
import pandas as pd
# Re-create your sample data
s = """Dates,Hours_played
2014-11-06,11
2014-12-06,4
2015-09-06,5
2015-07-06,5"""
df = pd.read_table(io.StringIO(s), sep=',', index_col=0, parse_dates=True)
# The following codes are the answer relevant to your question.
data_2014 = df[df.index.year == 2014].as_matrix()
data_2015 = df[df.index.year == 2015].as_matrix()
data_to_plot = [data_2014, data_2015]
mpl_fig = plt.figure()
ax = mpl_fig.add_subplot(111)
ax.boxplot(data_to_plot)
plt.show()
To use subplots, you will need to plot them one by one: 要使用子图,您将需要一张一张地绘制它们:
import io
import matplotlib.pyplot as plt
import pandas as pd
# Re-create your sample data
s = """Dates,Hours_played
2014-11-06,11
2014-12-06,4
2015-09-06,5
2015-07-06,5"""
df = pd.read_table(io.StringIO(s), sep=',', parse_dates=[0])
df['Year'] = df.Dates.dt.year
df.set_index(['Year', 'Dates'], inplace=True)
# The following codes are the answer relevant to your question.
mpl_fig = plt.figure()
ax1 = mpl_fig.add_subplot(121)
ax1.boxplot(df.loc[2014]['Hours_played'], labels=[2014])
ax2 = mpl_fig.add_subplot(122)
ax2.boxplot(df.loc[2015]['Hours_played'], labels=[2015])
plt.show()
If you want to put all the boxes in the same plot, you can do something like this: 如果要将所有方框放在同一图中,则可以执行以下操作:
import matplotlib.pyplot as plt
def setBoxColors(bp, num_plots):
color = ['red', 'blue', 'green']
for idx in range(num_plots):
plt.setp(bp['boxes'][idx], color=color[idx])
plt.setp(bp['caps'][2*idx], color=color[idx])
plt.setp(bp['caps'][2*idx+1], color=color[idx])
plt.setp(bp['whiskers'][2*idx], color=color[idx])
plt.setp(bp['whiskers'][2*idx+1], color=color[idx])
plt.setp(bp['fliers'][2*idx], color=color[idx])
plt.setp(bp['fliers'][2*idx+1], color=color[idx])
plt.setp(bp['medians'][idx], color=color[idx])
# Some fake data to plot
A = [[1, 2, 5,]]
B = [[3, 4, 5]]
C = [[1, 7, 10]]
fig = plt.figure()
ax = plt.axes()
plt.hold(True)
bp = plt.boxplot(A, positions = [2], widths = 0.6, patch_artist=True)
setBoxColors(bp, 1)
bp = plt.boxplot(B, positions = [6], widths = 0.6, patch_artist=True)
setBoxColors(bp, 1)
bp = plt.boxplot(C, positions = [10], widths = 0.6, patch_artist=True)
setBoxColors(bp, 1)
# set axes limits and labels
plt.xlim(0,12)
plt.ylim(0,12)
ax.set_xticklabels(['A', 'B', 'C'])
ax.set_xticks([2, 6, 10])
# draw temporary legend
hB, = plt.plot([1,1],'r-')
plt.legend((hB, ),('Type1', ))
hB.set_visible(False)
plt.show()
With the help of Scott Boston, Y. Luo, and yuhow5566, I was able to devise an interesting answer. 在斯科特·波士顿,Y。Luo和yuhow5566的帮助下,我提出了一个有趣的答案。 From Scott, I learned that it's better not to index the Dates (keep them a regular column) for this type of boxplot; 从Scott那里,我了解到最好不要为这种箱形图索引日期(将它们保留在常规列中)。 and from Y. Luo, I learned how to create a new column, while isolating the year from a datetime value. 并且从Y. Luo,我学会了如何创建新列,同时将日期和日期时间值隔离。
df['Year'] = s['Dates'].dt.year
df.boxplot(column='Hours_played', by='Year', figsize=(9,9))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.