简体   繁体   English

在 seaborn 散点图中对分类 x 轴进行排序

[英]Sort categorical x-axis in a seaborn scatter plot

I am trying to plot the top 30 percent values in a data frame using a seaborn scatter plot as shown below.我正在尝试使用 seaborn 散点图绘制数据框中前 30% 的值,如下所示。

在此处输入图片说明

The reproducible code for the same plot:同一图的可重现代码:

import seaborn as sns

df = sns.load_dataset('iris')

#function to return top 30 percent values in a dataframe.
def extract_top(df):
    n = int(0.3*len(df))
    top = df.sort_values('sepal_length', ascending = False).head(n)

    return top

#storing the top values
top = extract_top(df)

#plotting
sns.scatterplot(data = top,
                x='species', y='sepal_length', 
                color = 'black',
                s = 100,
                marker = 'x',)

Here, I want sort the x-axis in order = ['virginica','setosa','versicolor'] .在这里,我想按order = ['virginica','setosa','versicolor']对 x 轴进行order = ['virginica','setosa','versicolor'] When I tried to use order as one of the parameter in sns.scatterplot() , it returned an error AttributeError: 'PathCollection' object has no property 'order' .当我尝试使用order作为sns.scatterplot()中的参数之一时,它返回了一个错误AttributeError: 'PathCollection' object has no property 'order' What is the right way to do it?正确的做法是什么?

Please note: In the dataframe, setosa is also a category in species , however, in the top 30% values non of its value is falling.请注意:在数据框中, setosa也是species一个类别,但是,在前 30% 的值中,它的值没有下降。 Hence, that label is not shown in the example output from the reproducible code at the top.因此,该标签未显示在顶部可重现代码的示例输出中。 But I want even that label in the x-axis as well in the given order as shown below:但我甚至希望 x 轴上的标签也按照给定的顺序排列,如下所示:

在此处输入图片说明

scatterplot() is not the correct tool for the job. scatterplot()不是该工作的正确工具。 Since you have a categorical axis you want to use stripplot() and not scatterplot() .由于您有一个分类轴,因此您想使用stripplot()而不是stripplot() scatterplot() See the difference between relational and categorical plots here https://seaborn.pydata.org/api.html在此处查看关系图和分类图之间的区别https://seaborn.pydata.org/api.html

sns.stripplot(data = top,
              x='species', y='sepal_length', 
              order = ['virginica','setosa','versicolor'],
              color = 'black', jitter=False)

在此处输入图片说明

This means sns.scatterplot() does not take order as one of its args .这意味着sns.scatterplot()不会将order作为其args For species setosa , you can use alpha to hide the scatter points while keep the ticks.对于物种setosa ,您可以使用alpha来隐藏散点,同时保留刻度。

import seaborn as sns

df = sns.load_dataset('iris')

#function to return top 30 percent values in a dataframe.
def extract_top(df):
    n = int(0.3*len(df))
    top = df.sort_values('sepal_length', ascending = False).head(n)

    return top

#storing the top values
top = extract_top(df)
top.append(top.iloc[0,:])
top.iloc[-1,-1] = 'setosa'
order = ['virginica','setosa','versicolor']

#plotting
for species in order:
    alpha = 1 if species != 'setosa' else 0
    sns.scatterplot(x="species", y="sepal_length",
                    data=top[top['species']==species],
                    alpha=alpha,
                    marker='x',color='k')

the output is输出是

输出

For those wanting to make use of the extra arguments available in sns.scatterplot over sns.strpplot (size and style mappings for variables), it's possible to set the order of the x axis simply by sorting the dataframe before passing it to seaborn.对于那些想要在 sns.strpplot 上使用 sns.scatterplot 中可用的额外参数(变量的大小和样式映射)的人,可以在将数据帧传递给 seaborn 之前简单地通过对数据帧进行排序来设置 x 轴的顺序。 The following will sort alphabetically.以下将按字母顺序排列。

df.sort_values(feature)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM