I am trying to create a countplot with a lineplot over it as practice for some data visualisation I will be doing in work. I am looking at the kickstarter data on kaggle Link here
I run a countplot with a hue on the state of the project (successful, failed, canceled) and both of these are ordered
filter_list = ['failed', 'successful', 'canceled']
df2 = df[df.state.isin(filter_list)]
fig = plt.gcf()
fig.set_size_inches( 16, 10)
sns.countplot(x='main_category', hue='state', data=df2, order = df2['main_category'].value_counts().index,
hue_order = df2['state'].value_counts().index)
I then create my second axis and add a lineplot
fig, ax = plt.subplots()
fig.set_size_inches( 16, 10)
ax = sns.countplot(x='main_category', hue='state', data=df, ax=ax, order = df2['main_category'].value_counts().index,
hue_order = df2['state'].value_counts().index)
ax2 = ax.twinx()
sns.lineplot(x='main_category', y='backers', data=df2, ax =ax2)
But this changes the column labels as seen below:
It appears that the data is the same its just the order of columns is different. I am still learning so my code may be inefficent or some of it redundant but any help would be appreciated. The only other things are how df is created which is as follows:
import pandas as pd
import numpy as np
import seaborn as sns; sns.set(style="white", color_codes=True)
import matplotlib.pyplot as plt
df = pd.read_csv('ks.csv')
df = df.drop(['ID'], axis = 1)
df.head()
I don't think lineplot
is what you are looking for. lineplot
is supposed to be used with numeric data , not categorical. I'm even surprised this worked at all.
I think you are looking for pointplot
instead
filter_list = ['failed', 'successful', 'canceled']
df2 = df[df.state.isin(filter_list)]
order = df2['main_category'].value_counts().index
fig = plt.figure()
ax1 = sns.countplot(x='main_category', hue='state', data=df2, order=order,
hue_order=filter_list)
ax2 = ax1.twinx()
sns.pointplot(x='main_category', y='backers', data=df2, ax=ax2, order=order)
Note that used like that, pointplot
will show the average number of backers across categories. If that's not what you want, you can pass another aggregation function using the estimator=
paramater
eg
sns.pointplot(x='main_category', y='backers', data=df2, ax=ax2, order=order, estimator=np.sum)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.