简体   繁体   English

基于熊猫组的多个箱线图

[英]Multiple boxplots based on pandas groups

Here is how my dataframe looks like: 这是我的数据框的样子:

year    item_id      sales_quantity
 2014     1            10
 2014     1             4
 ...      ...          ...

 2015     1             7
 2015     1             10
 ...     ...          ...
 2014     2             1
 2014     2             8
 ...      ...          ...

 2015     2             17
 2015     2             30
 ...     ...          ...
 2014     3             9
 2014     3             18
 ...     ...          ...

For each item_id, I want to plot a boxplot showing the distribution for each year. 对于每个item_id,我想绘制一个箱形图以显示每年的分布。

Here is what I tried: 这是我尝试过的:

data = pd.DataFrame.from_csv('electronics.csv')
grouped = data.groupby(['year'])
ncols=4
nrows = int(np.ceil(grouped.ngroups/ncols))

fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(35,45), 
sharey=False)

for (key, ax) in zip(grouped.groups.keys(), axes.flatten()):
    grouped.get_group(key).boxplot(x='year', y='sales_quantity', 
    ax=ax, label=key)

I get the error boxplot() got multiple values for argument 'x' . 我得到错误boxplot() got multiple values for argument 'x' Can someone please tell me how to do this right? 有人可以告诉我该怎么做吗?


If I have only a single item, then the following works sns.boxplot(data.sales_quantity, groupby = data.year) . 如果我只有一个项目,则以下工作为sns.boxplot(data.sales_quantity, groupby = data.year) How could I extend it for multiple items? 如何将其扩展到多个项目?


Link to csv 链接到csv

I will leave this simple version for others... 我将把这个简单的版本留给其他人...

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_table('sample.txt', delimiter='\s+')

fig, axes = plt.subplots(1, 3, sharey=True)
for n, i in enumerate(df['item_id'].unique()):
    idf = df[df['item_id'] == int('{}'.format(i))][['year', 'sales_quantity']].pivot(columns='year')
    print(idf)

    idf.plot.box(ax=axes[n])
    axes[n].set_title('Item ID {}'.format(i))
    axes[n].set_xticklabels([e[1] for e in idf.columns])

plt.show()

sample.txt sample.txt

year    item_id      sales_quantity
 2014     1            10
 2014     1             4
 2015     1             7
 2015     1             10
 2014     2             1
 2014     2             8
 2015     2             17
 2015     2             30
 2014     3             9
 2014     3             18

在此处输入图片说明

Please check comment on the code. 请检查代码注释。

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('electronics_157_3cols.csv')
print(df)

fig, axes = plt.subplots(1, len(df['item_id_copy'].unique()), sharey=True)
for n, i in enumerate(df['item_id_copy'].unique()):
    idf = df[df['item_id_copy'] == int('{}'.format(i))][['year', 'sales_quantity']].pivot(columns='year')
    print(idf)

    idf.plot.box(ax=axes[n])
    axes[n].set_title('ID {}'.format(i))
    axes[n].set_xticklabels([e[1] for e in idf.columns], rotation=45)
    axes[n].set_ylim(0, 1)  # You should disable this line to specify outlier properly. (but I didn't to show you a normal graph)

plt.show()

在此处输入图片说明

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('electronics_157_3cols.csv')
print(df)

fig, axes = plt.subplots(2, 5, sharey=True)

gen_n  = (n for n in range(1, 11))
gen_i = (i for i in df['item_id_copy'].unique())

for r in range(2):
    for c in range(5):
        n = gen_n.__next__()
        i = gen_i.__next__()
        idf = df[df['item_id_copy'] == int('{}'.format(i))][['year', 'sales_quantity']].pivot(columns='year')
        print(idf)

        idf.plot.box(ax=axes[r][c])
        axes[r][c].set_title('ID {}'.format(i))
        axes[r][c].set_xticklabels([e[1] for e in idf.columns], rotation=0)
        axes[r][c].set_ylim(0, 1)

plt.show()        

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM