简体   繁体   中英

Pandas plotting incorrectly sorts the binned values on the graph

I am using Pandas to plot a DataFrame which contains three types of columns: Interest, Gender, and Experience Points.

I want to bin the Experience points into specific ranges, and then group the DataFrame by the binned values, Interest, and Gender. I then want to plot the counts by Interest for a specific Gender (ex: Male).

Using the code below, I was able to get my desired plot, however, Pandas is incorrectly sorting the binned values on the x-axis (see the attached image of what I mean).

在此输入图像描述

Notice when I print my DataFrame, the binned values are in correct order but in the graph, the binned values are incorrectly sorted.

Experience Points  Interest  Gender
(0, 8]             Bike      Female     9
                             Male       5
                   Hike      Female     6
                             Male      10
                   Swim      Female     7
                             Male       7
(8, 16]            Bike      Female     8
                             Male       3
                   Hike      Female     4
                             Male       7
                   Swim      Female    10
                             Male       4
(16, 24]           Bike      Female     4
                             Male       6
                   Hike      Female    10
...

My code:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib
import random

matplotlib.style.use('ggplot')


interest = ['Swim','Bike','Hike']
gender = ['Male','Female']
experience_points = np.arange(0,200)

df = pd.DataFrame({'Interest':[random.choice(interest) for x in range(1000)],
                   'Gender':[random.choice(gender) for x in range(1000)],
                   'Experience Points':[random.choice(experience_points) for x in range(1000)]})

bins = np.arange(0,136,8)
exp_binned = pd.cut(df['Experience Points'],np.append(bins,df['Experience Points'].max()+1))

exp_distribution = df.groupby([exp_binned,'Interest','Gender']).size()

# Printed dataframe has correct sorting by binned values 
print exp_distribution 

#Plotted dataframe has incorrect sorting of binned values 
exp_distribution.unstack(['Gender','Interest'])['Male'].plot(kind='bar') 

plt.show()

Troubleshooting Steps Tried:

Using plot(kind='bar',sort_columns=True) does NOT fix the issue

Grouping by only binned values and then plotting DOES fix the issue, but then I am unable to group by Interest or Gender. For example the following works:

exp_distribution = df.groupby([exp_binned]).size()
exp_distribution.plot(kind='bar') 

unstack() messed up the order, and the index order has to be restored. you may want to submit a bug report for this.

A work around:

exp_distrubtion.unstack(['Gender','Interest']).ix[exp_distrubtion.index.get_level_values(0).unique(),
                                                  'Male'].plot(kind='bar') 

在此输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM