Seaborn countplot with second axis with ordered data

Question

I am trying to create a countplot with a lineplot over it as practice for some data visualisation I will be doing in work. I am looking at the kickstarter data on kaggle Link here

I run a countplot with a hue on the state of the project (successful, failed, canceled) and both of these are ordered

filter_list = ['failed', 'successful', 'canceled']
df2 = df[df.state.isin(filter_list)]

fig = plt.gcf()
fig.set_size_inches( 16, 10)
sns.countplot(x='main_category', hue='state', data=df2, order = df2['main_category'].value_counts().index, 
              hue_order = df2['state'].value_counts().index)

This comes out as follows:

I then create my second axis and add a lineplot

fig, ax = plt.subplots()
fig.set_size_inches( 16, 10)

ax = sns.countplot(x='main_category', hue='state', data=df, ax=ax, order = df2['main_category'].value_counts().index, 
              hue_order = df2['state'].value_counts().index)

ax2 = ax.twinx()
sns.lineplot(x='main_category', y='backers', data=df2, ax =ax2)

But this changes the column labels as seen below:

It appears that the data is the same its just the order of columns is different. I am still learning so my code may be inefficent or some of it redundant but any help would be appreciated. The only other things are how df is created which is as follows:

import pandas as pd
import numpy as np
import seaborn as sns; sns.set(style="white", color_codes=True)
import matplotlib.pyplot as plt

df = pd.read_csv('ks.csv')
df = df.drop(['ID'], axis = 1)
df.head()

Answer 1

I don't think lineplot is what you are looking for. lineplot is supposed to be used with numeric data , not categorical. I'm even surprised this worked at all.

I think you are looking for pointplot instead

filter_list = ['failed', 'successful', 'canceled']
df2 = df[df.state.isin(filter_list)]
order = df2['main_category'].value_counts().index

fig = plt.figure()
ax1 = sns.countplot(x='main_category', hue='state', data=df2, order=order, 
              hue_order=filter_list)
ax2 = ax1.twinx()
sns.pointplot(x='main_category', y='backers', data=df2, ax=ax2, order=order)

Note that used like that, pointplot will show the average number of backers across categories. If that's not what you want, you can pass another aggregation function using the estimator= paramater

eg

sns.pointplot(x='main_category', y='backers', data=df2, ax=ax2, order=order, estimator=np.sum)

Seaborn countplot with second axis with ordered data

Question

1 answers

solution1
2 ACCPTED 2020-01-02 13:08:26

Seaborn countplot with second axis with ordered data

Question

1 answers

solution1 2 ACCPTED 2020-01-02 13:08:26

solution1
2 ACCPTED 2020-01-02 13:08:26