基于熊猫组的多个箱线图

Question

Here is how my dataframe looks like: 这是我的数据框的样子：

year    item_id      sales_quantity
 2014     1            10
 2014     1             4
 ...      ...          ...

 2015     1             7
 2015     1             10
 ...     ...          ...
 2014     2             1
 2014     2             8
 ...      ...          ...

 2015     2             17
 2015     2             30
 ...     ...          ...
 2014     3             9
 2014     3             18
 ...     ...          ...

For each item_id, I want to plot a boxplot showing the distribution for each year. 对于每个item_id，我想绘制一个箱形图以显示每年的分布。

Here is what I tried: 这是我尝试过的：

data = pd.DataFrame.from_csv('electronics.csv')
grouped = data.groupby(['year'])
ncols=4
nrows = int(np.ceil(grouped.ngroups/ncols))

fig, axes = plt.subplots(nrows=nrows, ncols=ncols, figsize=(35,45), 
sharey=False)

for (key, ax) in zip(grouped.groups.keys(), axes.flatten()):
    grouped.get_group(key).boxplot(x='year', y='sales_quantity', 
    ax=ax, label=key)

I get the error boxplot() got multiple values for argument 'x' . 我得到错误boxplot() got multiple values for argument 'x' 。 Can someone please tell me how to do this right? 有人可以告诉我该怎么做吗？

If I have only a single item, then the following works sns.boxplot(data.sales_quantity, groupby = data.year) . 如果我只有一个项目，则以下工作为sns.boxplot(data.sales_quantity, groupby = data.year) 。 How could I extend it for multiple items? 如何将其扩展到多个项目？

Link to csv 链接到csv

Answer 1

I will leave this simple version for others... 我将把这个简单的版本留给其他人...

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_table('sample.txt', delimiter='\s+')

fig, axes = plt.subplots(1, 3, sharey=True)
for n, i in enumerate(df['item_id'].unique()):
    idf = df[df['item_id'] == int('{}'.format(i))][['year', 'sales_quantity']].pivot(columns='year')
    print(idf)

    idf.plot.box(ax=axes[n])
    axes[n].set_title('Item ID {}'.format(i))
    axes[n].set_xticklabels([e[1] for e in idf.columns])

plt.show()

sample.txt sample.txt

year    item_id      sales_quantity
 2014     1            10
 2014     1             4
 2015     1             7
 2015     1             10
 2014     2             1
 2014     2             8
 2015     2             17
 2015     2             30
 2014     3             9
 2014     3             18

Answer 2

Please check comment on the code. 请检查代码注释。

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('electronics_157_3cols.csv')
print(df)

fig, axes = plt.subplots(1, len(df['item_id_copy'].unique()), sharey=True)
for n, i in enumerate(df['item_id_copy'].unique()):
    idf = df[df['item_id_copy'] == int('{}'.format(i))][['year', 'sales_quantity']].pivot(columns='year')
    print(idf)

    idf.plot.box(ax=axes[n])
    axes[n].set_title('ID {}'.format(i))
    axes[n].set_xticklabels([e[1] for e in idf.columns], rotation=45)
    axes[n].set_ylim(0, 1)  # You should disable this line to specify outlier properly. (but I didn't to show you a normal graph)

plt.show()

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv('electronics_157_3cols.csv')
print(df)

fig, axes = plt.subplots(2, 5, sharey=True)

gen_n  = (n for n in range(1, 11))
gen_i = (i for i in df['item_id_copy'].unique())

for r in range(2):
    for c in range(5):
        n = gen_n.__next__()
        i = gen_i.__next__()
        idf = df[df['item_id_copy'] == int('{}'.format(i))][['year', 'sales_quantity']].pivot(columns='year')
        print(idf)

        idf.plot.box(ax=axes[r][c])
        axes[r][c].set_title('ID {}'.format(i))
        axes[r][c].set_xticklabels([e[1] for e in idf.columns], rotation=0)
        axes[r][c].set_ylim(0, 1)

plt.show()

基于熊猫组的多个箱线图

问题描述

2 个解决方案

解决方案1
0 2017-06-29 07:59:01

解决方案2
0 2017-06-29 09:58:59

基于熊猫组的多个箱线图

问题描述

2 个解决方案

解决方案1 0 2017-06-29 07:59:01

解决方案2 0 2017-06-29 09:58:59

解决方案1
0 2017-06-29 07:59:01

解决方案2
0 2017-06-29 09:58:59