简体   繁体   English

按pandas / matplotlib条形图中的条形顺序排序

[英]Sorting the order of bars in pandas/matplotlib bar plots

What is the Pythonic/pandas way of sorting 'levels' within a column in pandas to give a specific ordering of bars in bar plot. 什么是Pythonic / pandas在pandas中的列中排序“级别”以在条形图中给出特定的条形排序。

For example, given: 例如,给定:

import pandas as pd
df = pd.DataFrame({
    'group': ['a', 'a', 'a', 'a', 'a', 'a', 'a', 
              'b', 'b', 'b', 'b', 'b', 'b', 'b'],
    'day': ['Mon', 'Tues', 'Fri', 'Thurs', 'Sat', 'Sun', 'Weds',
            'Fri', 'Sun', 'Thurs', 'Sat', 'Weds', 'Mon', 'Tues'],
    'amount': [1, 2, 4, 2, 1, 1, 2, 4, 5, 3, 4, 2, 1, 3]})
dfx = df.groupby(['group'])
dfx.plot(kind='bar', x='day')

I can generate the following pair of plots: 我可以生成以下一对图:

无序的条形图

The order of the bars follows the row order. 条形的顺序遵循行顺序。

What's the best way of reordering the data so that the bar charts have bars ordered Mon-Sun? 重新排序数据的最佳方法是什么,以便条形图有Mon-Sun订购的条形码?

UPDATE: this rubbish solution works - but it's far from elegant in the way it uses an extra sorting column: 更新:这种垃圾解决方案有效 - 但它使用额外排序列的方式远非优雅:

df2 = pd.DataFrame({
    'day': ['Mon', 'Tues', 'Weds', 'Thurs', 'Fri', 'Sat', 'Sun'],
    'num': [0, 1, 2, 3, 4, 5, 6]})
df = pd.merge(df, df2, on='day')
df = df.sort_values('num')
dfx = df.groupby(['group'])
dfx.plot(kind='bar', x='day')

FURTHER GENERALISATION: 进一步概括:

Is there a solution that also fixes the order of bars in a 'dodged' bar plot: 是否有一个解决方案也修复了“躲闪”条形图中的条形顺序:

df.pivot('day', 'group', 'amount').plot(kind='bar')

在此输入图像描述

You'll have to provide a mapping to specify how to order the day names. 您必须提供映射以指定如何订购日期名称。 (If they were stored as proper dates, there would be other ways to do this.) (如果将它们存储为正确的日期,则还有其他方法可以执行此操作。)

Updated: 更新:

Build the key. 建立密钥。 You could write out a dictionary explicitly or use something clever like this dict comprehension. 你可以明确写出一本字典,也可以使用像dict理解这样聪明的东西。

weekdays = ['Mon', 'Tues', 'Weds', 'Thurs', 'Fri', 'Sat', 'Sun']
mapping = {day: i for i, day in enumerate(weekdays)}
key = df['day'].map(mapping)

And the sorting is simple: 排序很简单:

df.iloc[key.argsort()]

I know this response is late, but a simplistic solution to the two cases presented, without use of a dictionary/mappings would be something like I've posted below. 我知道这个响应是迟到的,但是对于这两个案例的简单解决方案,没有使用字典/映射,就像我在下面发布的那样。

Setting 'day' as an index enables you to use .loc to select data in a specific order 将“day”设置为索引使您可以使用.loc以特定顺序选择数据

1) For the two separate plots 1)对于两个单独的图

df=pd.DataFrame({'group':['a','a','a','a','a','a','a','b','b','b','b','b','b','b'],
     'day':['Mon','Tues','Fri','Thurs','Sat','Sun','Weds','Fri','Sun','Thurs','Sat','Weds','Mon','Tues'],
     'amount':[1,2,4,2,1,1,2,4,5,3,4,2,1,3]})

order = ['Mon', 'Tues', 'Weds','Thurs','Fri','Sat','Sun']`
df.set_index('day').loc[order].groupby('group').plot(kind='bar')

2) For the pivot example with the dodged plot: 2)对于具有躲闪情节的枢轴示例:

order = ['Mon', 'Tues', 'Weds','Thurs','Fri','Sat','Sun']
df.pivot('day','group','amount').loc[order].plot(kind='bar')

note that pivot results in day being in the index already so you can use .loc here again. 请注意,pivot现在已经在索引中,因此您可以再次使用.loc。

Edit: it is best practice to use .loc instead of .ix in these solutions, .ix will be deprecated and can have weird results when column names and indexes are numbers. 编辑:最好在这些解决方案中使用.loc而不是.ix,.ix将被弃用,并且当列名和索引是数字时可能会产生奇怪的结果。

I will provide bellow code to extend Dan's answer to address the "FURTHER GENERALIZATION" section of the OP's question. 我将提供以下代码来扩展Dan的答案,以解决OP问题的“进一步概括”部分。 First, a complete example for the simple case (just one variable) based in Dan's solution: 首先,基于Dan的解决方案的简单案例(仅一个变量)的完整示例:

import pandas as pd

# Create dataframe 
df=pd.DataFrame({
    'group':['a','a','a','a','a','a','a','b','b','b','b','b','b','b'],
    'day':['Mon','Tues','Fri','Thurs','Sat','Sun','Weds','Fri','Sun','Thurs','Sat','Weds','Mon','Tues'],
    'amount':[1,2,4,2,1,1,2,4,5,3,4,2,1,3]
})


# Calculate the total amount for each day
df_grouped = df.groupby(['day']).sum().amount.reset_index()

# Use Dan's trick to order days names in the table created by groupby
weekdays = ['Mon', 'Tues', 'Weds', 'Thurs', 'Fri', 'Sat', 'Sun']
mapping = {day: i for i, day in enumerate(weekdays)}
key = df_grouped['day'].map(mapping)    
df_grouped = df_grouped.iloc[key.argsort()]

# Draw the bar chart
df_grouped.plot(kind='bar', x='day')

And now, we use the same ordering technique to order the rows of the pivot table (instead of the rows created by groupby). 现在,我们使用相同的排序技术来排序数据透视表的行(而不是groupby创建的行)。

import pandas as pd

# Create dataframe 
df=pd.DataFrame({
    'group':['a','a','a','a','a','a','a','b','b','b','b','b','b','b'],
    'day':['Mon','Tues','Fri','Thurs','Sat','Sun','Weds','Fri','Sun','Thurs','Sat','Weds','Mon','Tues'],
    'amount':[1,2,4,2,1,1,2,4,5,3,4,2,1,3]
})

# Get the amount for each day AND EACH GROUP
df_grouped = df.groupby(['group', 'day']).sum().amount.reset_index()

# Create pivot table to get the total amount for each day and each in the proper format to plot multiple series with pandas
df_pivot = df_grouped.pivot('day','group','amount').reset_index()

# Use Dan's trick to order days names in the table created by PIVOT (not the table created by groupby, in the previous example)
weekdays = ['Mon', 'Tues', 'Weds', 'Thurs', 'Fri', 'Sat', 'Sun']
mapping = {day: i for i, day in enumerate(weekdays)}
key = df_pivot['day'].map(mapping)    
df_pivot = df_pivot.iloc[key.argsort()]

# Draw the bar chart
df_pivot.plot(kind='bar', x='day')

The result is shown bellow: 结果如下所示:

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM