简体   繁体   中英

Sort categorical x-axis in a seaborn scatter plot

I am trying to plot the top 30 percent values in a data frame using a seaborn scatter plot as shown below.

在此处输入图片说明

The reproducible code for the same plot:

import seaborn as sns

df = sns.load_dataset('iris')

#function to return top 30 percent values in a dataframe.
def extract_top(df):
    n = int(0.3*len(df))
    top = df.sort_values('sepal_length', ascending = False).head(n)

    return top

#storing the top values
top = extract_top(df)

#plotting
sns.scatterplot(data = top,
                x='species', y='sepal_length', 
                color = 'black',
                s = 100,
                marker = 'x',)

Here, I want sort the x-axis in order = ['virginica','setosa','versicolor'] . When I tried to use order as one of the parameter in sns.scatterplot() , it returned an error AttributeError: 'PathCollection' object has no property 'order' . What is the right way to do it?

Please note: In the dataframe, setosa is also a category in species , however, in the top 30% values non of its value is falling. Hence, that label is not shown in the example output from the reproducible code at the top. But I want even that label in the x-axis as well in the given order as shown below:

在此处输入图片说明

scatterplot() is not the correct tool for the job. Since you have a categorical axis you want to use stripplot() and not scatterplot() . See the difference between relational and categorical plots here https://seaborn.pydata.org/api.html

sns.stripplot(data = top,
              x='species', y='sepal_length', 
              order = ['virginica','setosa','versicolor'],
              color = 'black', jitter=False)

在此处输入图片说明

This means sns.scatterplot() does not take order as one of its args . For species setosa , you can use alpha to hide the scatter points while keep the ticks.

import seaborn as sns

df = sns.load_dataset('iris')

#function to return top 30 percent values in a dataframe.
def extract_top(df):
    n = int(0.3*len(df))
    top = df.sort_values('sepal_length', ascending = False).head(n)

    return top

#storing the top values
top = extract_top(df)
top.append(top.iloc[0,:])
top.iloc[-1,-1] = 'setosa'
order = ['virginica','setosa','versicolor']

#plotting
for species in order:
    alpha = 1 if species != 'setosa' else 0
    sns.scatterplot(x="species", y="sepal_length",
                    data=top[top['species']==species],
                    alpha=alpha,
                    marker='x',color='k')

the output is

输出

For those wanting to make use of the extra arguments available in sns.scatterplot over sns.strpplot (size and style mappings for variables), it's possible to set the order of the x axis simply by sorting the dataframe before passing it to seaborn. The following will sort alphabetically.

df.sort_values(feature)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM