简体   繁体   English

从使用`pd.Categorical`过滤的数据帧绘制Python seaborn

[英]Python seaborn plotting from dataframe that was filtered using `pd.Categorical`

I'm trying to plot some data from a subset of my dataframe, but it is plotting empty ticks for data that should have been filtered out.我试图从我的数据帧的一个子集中绘制一些数据,但它正在为应该被过滤掉的数据绘制空刻度。 I know the issue is that I used pd.Categorical() , but I need to.我知道问题是我使用了pd.Categorical() ,但我需要。 How do I plot only the filtered data (ie just a1 and a2 ) and no extra ticks?我如何只绘制过滤后的数据(即只绘制a1a2 )而没有额外的刻度? Example:例子:

import numpy as np
import pandas as pd
data = {'A':['a2', 'a2', 'a2', 'a1', 'a1', 'a1', 'a3', 'a3', 'a3'],
        'B': np.random.normal(0, 1, 9)}

df = pd.DataFrame(data)

df : df

df
Out[1]: 
    A         B
0  a2 -1.076173
1  a2 -2.574480
2  a2  0.863081
3  a1  1.411732
4  a1 -0.937692
5  a1  0.929105
6  a3 -1.071276
7  a3  0.901292
8  a3  0.740417


# Sort A using pd.categorical
df['A'] = pd.Categorical(df['A'], ['a1', 'a2', 'a3'])
df = df.sort_values(by='A')

plotdf = df.loc[df['A']!='a3']

plotdf should now be a subset of df ... which it is: plotdf现在应该是df一个子集......它是:

plotdf
Out[2]: 
    A         B
3  a1  1.411732
4  a1 -0.937692
5  a1  0.929105
0  a2 -1.076173
1  a2 -2.574480
2  a2  0.863081

But when we plot it has retained the filtered-out tick position:但是当我们绘制它时,它保留了过滤掉的刻度位置:

import matplotlib.pyplot as plt
import seaborn as sns

fig, ax = plt.subplots()
sns.barplot(x='A', y='B', data=plotdf)
plt.show()

在此处输入图片说明

Do I need to re-specify the categories before I plot?绘图前是否需要重新指定类别? Seems a bit odd...好像有点奇怪...

This seems to be an effect of the categorical type that maintains all of its possible values even if they are not always present (see print(plotdf['A'].dtype) ).这似乎是分类类型的影响,即使它们并不总是存在,它也会保留所有可能的值(请参阅print(plotdf['A'].dtype) )。

for example, running plotdf.groupby('A').size() returns例如,运行plotdf.groupby('A').size()返回

A
a1    3
a2    3
a3    0

with category a3 showing up despite not being present in the dataframe.尽管数据框中不存在类别a3 ,但仍显示类别a3

In any case, if you don't want to see this empty space on the plot, you can tell seaborn which categories to plot using the order= parameter:在任何情况下,如果您不想在绘图上看到这个空白区域,您可以使用order=参数告诉 seaborn 绘制哪些类别:

sns.barplot(x='A', y='B', data=plotdf, order=['a1', 'a2'])

Note that if you want to be generic, you could do order=plotdf['A'].unique()请注意,如果您想通用,可以执行order=plotdf['A'].unique()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM