简体   繁体   中英

Seaborn countplot with second axis with ordered data

I am trying to create a countplot with a lineplot over it as practice for some data visualisation I will be doing in work. I am looking at the kickstarter data on kaggle Link here

I run a countplot with a hue on the state of the project (successful, failed, canceled) and both of these are ordered

filter_list = ['failed', 'successful', 'canceled']
df2 = df[df.state.isin(filter_list)]

fig = plt.gcf()
fig.set_size_inches( 16, 10)
sns.countplot(x='main_category', hue='state', data=df2, order = df2['main_category'].value_counts().index, 
              hue_order = df2['state'].value_counts().index)

This comes out as follows: 在此处输入图片说明

I then create my second axis and add a lineplot

fig, ax = plt.subplots()
fig.set_size_inches( 16, 10)

ax = sns.countplot(x='main_category', hue='state', data=df, ax=ax, order = df2['main_category'].value_counts().index, 
              hue_order = df2['state'].value_counts().index)

ax2 = ax.twinx()
sns.lineplot(x='main_category', y='backers', data=df2, ax =ax2)

But this changes the column labels as seen below: 在此处输入图片说明

It appears that the data is the same its just the order of columns is different. I am still learning so my code may be inefficent or some of it redundant but any help would be appreciated. The only other things are how df is created which is as follows:

import pandas as pd
import numpy as np
import seaborn as sns; sns.set(style="white", color_codes=True)
import matplotlib.pyplot as plt

df = pd.read_csv('ks.csv')
df = df.drop(['ID'], axis = 1)
df.head()

I don't think lineplot is what you are looking for. lineplot is supposed to be used with numeric data , not categorical. I'm even surprised this worked at all.

I think you are looking for pointplot instead

filter_list = ['failed', 'successful', 'canceled']
df2 = df[df.state.isin(filter_list)]
order = df2['main_category'].value_counts().index

fig = plt.figure()
ax1 = sns.countplot(x='main_category', hue='state', data=df2, order=order, 
              hue_order=filter_list)
ax2 = ax1.twinx()
sns.pointplot(x='main_category', y='backers', data=df2, ax=ax2, order=order)

在此处输入图片说明

Note that used like that, pointplot will show the average number of backers across categories. If that's not what you want, you can pass another aggregation function using the estimator= paramater

eg

sns.pointplot(x='main_category', y='backers', data=df2, ax=ax2, order=order, estimator=np.sum)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM