[英]Python seaborn plotting from dataframe that was filtered using `pd.Categorical`
I'm trying to plot some data from a subset of my dataframe, but it is plotting empty ticks for data that should have been filtered out.我试图从我的数据帧的一个子集中绘制一些数据,但它正在为应该被过滤掉的数据绘制空刻度。 I know the issue is that I used
pd.Categorical()
, but I need to.我知道问题是我使用了
pd.Categorical()
,但我需要。 How do I plot only the filtered data (ie just a1
and a2
) and no extra ticks?我如何只绘制过滤后的数据(即只绘制
a1
和a2
)而没有额外的刻度? Example:例子:
import numpy as np
import pandas as pd
data = {'A':['a2', 'a2', 'a2', 'a1', 'a1', 'a1', 'a3', 'a3', 'a3'],
'B': np.random.normal(0, 1, 9)}
df = pd.DataFrame(data)
df
: df
:
df
Out[1]:
A B
0 a2 -1.076173
1 a2 -2.574480
2 a2 0.863081
3 a1 1.411732
4 a1 -0.937692
5 a1 0.929105
6 a3 -1.071276
7 a3 0.901292
8 a3 0.740417
# Sort A using pd.categorical
df['A'] = pd.Categorical(df['A'], ['a1', 'a2', 'a3'])
df = df.sort_values(by='A')
plotdf = df.loc[df['A']!='a3']
plotdf
should now be a subset of df
... which it is: plotdf
现在应该是df
一个子集......它是:
plotdf
Out[2]:
A B
3 a1 1.411732
4 a1 -0.937692
5 a1 0.929105
0 a2 -1.076173
1 a2 -2.574480
2 a2 0.863081
But when we plot it has retained the filtered-out tick position:但是当我们绘制它时,它保留了过滤掉的刻度位置:
import matplotlib.pyplot as plt
import seaborn as sns
fig, ax = plt.subplots()
sns.barplot(x='A', y='B', data=plotdf)
plt.show()
Do I need to re-specify the categories before I plot?绘图前是否需要重新指定类别? Seems a bit odd...
好像有点奇怪...
This seems to be an effect of the categorical type that maintains all of its possible values even if they are not always present (see print(plotdf['A'].dtype)
).这似乎是分类类型的影响,即使它们并不总是存在,它也会保留所有可能的值(请参阅
print(plotdf['A'].dtype)
)。
for example, running plotdf.groupby('A').size()
returns例如,运行
plotdf.groupby('A').size()
返回
A
a1 3
a2 3
a3 0
with category a3
showing up despite not being present in the dataframe.尽管数据框中不存在类别
a3
,但仍显示类别a3
。
In any case, if you don't want to see this empty space on the plot, you can tell seaborn which categories to plot using the order=
parameter:在任何情况下,如果您不想在绘图上看到这个空白区域,您可以使用
order=
参数告诉 seaborn 绘制哪些类别:
sns.barplot(x='A', y='B', data=plotdf, order=['a1', 'a2'])
Note that if you want to be generic, you could do order=plotdf['A'].unique()
请注意,如果您想通用,可以执行
order=plotdf['A'].unique()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.