简体   繁体   English

熊猫DataFrame的多列并排的boxplot

[英]Side-by-side boxplot of multiple columns of a pandas DataFrame

One year of sample data: 一年的样本数据:

import pandas as pd
import numpy.random as rnd
import seaborn as sns
n = 365
df = pd.DataFrame(data = {"A":rnd.randn(n), "B":rnd.randn(n)+1},
                  index=pd.date_range(start="2017-01-01", periods=n, freq="D"))

I want to boxplot these data side-by-side grouped by the month (ie, two boxes per month, one for A and one for B ). 我想将这些数据并排分组,按月分组(即每月两个盒子,一个用于A ,一个用于B )。

For a single column sns.boxplot(df.index.month, df["A"]) works fine. 对于单个列sns.boxplot(df.index.month, df["A"])工作正常。 However, sns.boxplot(df.index.month, df[["A", "B"]]) throws an error ( ValueError: cannot copy sequence with size 2 to array axis with dimension 365 ). 但是, sns.boxplot(df.index.month, df[["A", "B"]])会抛出错误( ValueError: cannot copy sequence with size 2 to array axis with dimension 365 )。 Melting the data by the index ( pd.melt(df, id_vars=df.index, value_vars=["A", "B"], var_name="column") ) in order to use seaborn's hue property as a workaround doesn't work either ( TypeError: unhashable type: 'DatetimeIndex' ). 通过索引( pd.melt(df, id_vars=df.index, value_vars=["A", "B"], var_name="column") )来熔化数据,以便使用seaborn的hue属性作为解决方法工作要么( TypeError: unhashable type: 'DatetimeIndex' )。

(A solution doesn't necessarily need to use seaborn, if it is easier using plain matplotlib.) (如果使用普通的matplotlib更容易,解决方案不一定需要使用seaborn。)

Edit 编辑

I found a workaround that basically produces what I want. 我发现了一种基本上可以产生我想要的解决方法。 However, it becomes somewhat awkward to work with once the DataFrame includes more variables than I want to plot. 但是,一旦DataFrame包含的变量多于我想绘制的变量,就会变得有些尴尬。 So if there is a more elegant/direct way to do it, please share! 所以,如果有更优雅/直接的方式,请分享!

df_stacked = df.stack().reset_index()
df_stacked.columns = ["date", "vars", "vals"]
df_stacked.index = df_stacked["date"]
sns.boxplot(x=df_stacked.index.month, y="vals", hue="vars", data=df_stacked)

Produces: 生产: A和B的并排箱图,按月分组。

I do not understand your question completely, but you might to take a look at this approach using matplotlib . 我完全不了解你的问题,但你可以使用matplotlib来看看这个方法。 Not the best solution though. 虽然不是最好的解决方案。

1) Break df into 12 DataFrames by month s, all stacked in a list 1)中断df为12个DataFrames由month S,列表中的所有堆叠

DFList = []
for group in df_3.groupby(df_3.index.month):
    DFList.append(group[1])

2) Plot them one after the other in a loop: 2)在循环中一个接一个地绘制它们:

for _ in range(12):
    DFList[_].plot(kind='box', subplots=True, layout=(2,2), sharex=True, sharey=True, figsize=(7,7))

plt.show()

3) Here's a snapshot of the 1st three rows: 3)这是前三行的快照:

在此输入图像描述

You might also want to checkout matplotlib 's add_subplot method 您可能还想检查matplotlibadd_subplot方法

month_dfs = []
for group in df.groupby(df.index.month):
    month_dfs.append(group[1])

plt.figure(figsize=(30,5))
for i,month_df in enumerate(month_dfs):
    axi = plt.subplot(1, len(month_dfs), i + 1)
    month_df.plot(kind='box', subplots=False, ax = axi)
    plt.title(i+1)
    plt.ylim([-4, 4])

plt.show()

Will give this 会给这个

Not exactly what you're looking for but you get to keep a readable DataFrame if you add more variables. 不完全是您正在寻找的,但如果您添加更多变量,则可以保留可读的DataFrame。

You can also easily remove the axis by using 您也可以使用方便地移除轴

if i > 0:
        y_axis = axi.axes.get_yaxis()
        y_axis.set_visible(False)

in the loop before plt.show() plt.show()之前的循环中

This is quite straight-forward using Altair : 使用Altair非常简单:

alt.Chart(
    df.reset_index().melt(id_vars = ["index"], value_vars=["A", "B"]).assign(month = lambda x: x["index"].dt.month)
).mark_boxplot(
    extent='min-max'
).encode(
    alt.X('variable:N', title=''),
    alt.Y('value:Q'),
    column='month:N',
    color='variable:N'
)

在此输入图像描述 The code above melts the DataFrame and adds a month column. 上面的代码融合了DataFrame并添加了month列。 Then Altair creates box-plots for each variable broken down by months as the plot columns. 然后Altair为每个按月分解的变量创建箱形图作为绘图列。

here's a solution using pandas melting and seaborn: 这是使用熊猫融化和海鲈的解决方案:

import pandas as pd
import numpy.random as rnd
import seaborn as sns
n = 365
df = pd.DataFrame(data = {"A": rnd.randn(n),
                          "B": rnd.randn(n)+1,
                          "C": rnd.randn(n) + 10, # will not be plotted
                         },
                  index=pd.date_range(start="2017-01-01", periods=n, freq="D"))
df['month'] = df.index.month
df_plot = df.melt(id_vars='month', value_vars=["A", "B"])
sns.boxplot(x='month', y='value', hue='variable', data=df_plot)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM