[英]Labeling a Bar Graph Created from a Grouped Pandas DataFrame where there's a NaN Category
I create a nice and tidy grouped data frame and then I use that data in a simple seaborn barplot. 我创建了一个很好且整洁的分组数据框,然后在一个简单的seabar barplot中使用该数据。 However, when I try to add labels to the bars, I get the following error:
但是,当我尝试向标签添加标签时,出现以下错误:
ValueError: cannot convert float NaN to integer
ValueError:无法将float NaN转换为整数
I know this is because there is only one value (instead of two) for one of the grouped categories. 我知道这是因为分组类别之一只有一个值(而不是两个)。 How do I get it to label it "0"?
如何将其标记为“ 0”?
I've gone down the rabbit hole on this for a full day and haven't found anything. 我已经整整一天在兔子洞里走了,什么也没发现。 Here are the things that I've tried (in many different ways):
以下是我尝试过的方法(以许多不同方式):
pd.fillna()
. pd.fillna()
。 I work with a lot of data that frequently encounters this sort of problem, so I would really appreciate some help in solving this. 我处理大量经常遇到此类问题的数据,因此我非常感谢您为解决此问题提供的帮助。 It seems so simple.
似乎很简单。 What am I missing?
我想念什么? Thanks!
谢谢!
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# my initial data set
d = {'year' : [2014,2014,2014,2015,2015,],
'status' : ["n","y","n","n","n"],
'num' : [1,1,1,1,1]}
df = pd.DataFrame(d)
# groupby to create another dataframe
df2 = (df["status"]
.groupby(df["year"])
.value_counts(normalize=True)
.rename("Percent")
.apply(lambda x: x*100)
.reset_index())
# create my bar plot
f = plt.figure(figsize = (11,8.5))
ax1 = plt.subplot(2,2,1)
sns.barplot(x="year",
y="Percent",
hue="status",
hue_order = ["n","y"],
data=df2,
ci = None)
# label the bars
for p in ax1.patches:
ax1.text(p.get_x() + p.get_width()/2., p.get_height(), '%d%%' % round(p.get_height()),
fontsize=10, color='red', ha='center', va='bottom')
plt.show()
You could handle the empty-bar case by setting the height to zero if p.get_height()
returns NaN: 如果
p.get_height()
返回NaN,则p.get_height()
通过将高度设置为零来处理空条情况:
for p in ax1.patches:
height = p.get_height()
if np.isnan(height):
height = 0
ax1.text(p.get_x() + p.get_width()/2., height, '%d%%' % round(height),
fontsize=10, color='red', ha='center', va='bottom')
gives me 给我
Alternatively, you could expand your frame to ensure there's a zero there: 另外,您可以扩展框架以确保其中的值为零:
non_data_cols = df2.columns.drop("Percent")
full_index = pd.MultiIndex.from_product([df[col].unique() for col in non_data_cols], names=non_data_cols)
df2 = df2.set_index(non_data_cols.tolist()).reindex(full_index).fillna(0).reset_index()
which expands to give me 扩展给我
In [74]: df2
Out[74]:
year status Percent
0 2014 n 66.666667
1 2014 y 33.333333
2 2015 n 100.000000
3 2015 y 0.000000
When dealing with data where you have missing categories, a common trick that can be employed is stacking and unstacking the data. 当您处理缺少类别的数据时,可以采用的常见技巧是堆叠和堆叠数据。 The general idea can be viewed in this answer .
总体思路可以从这个答案中看出。 Once the data is formatted, you are able to
fillna
with your fill value (in this case 0), and leave your code as is. 格式化数据后,您就可以使用填充值(在这种情况下为0)进行填充,并保持代码
fillna
。
All you have to do is replace your current creation of df2
with the below code. 您所要做的就是用以下代码替换当前创建的
df2
。
df2 = (df.groupby('year').status.value_counts(normalize=True).mul(100)
.unstack().stack(dropna=False).fillna(0)
.rename('Percent').reset_index())
Which gives us: 这给了我们:
year status Percent
0 2014 n 66.666667
1 2014 y 33.333333
2 2015 n 100.000000
3 2015 y 0.000000
Now, with no changes to your plotting code, I get this output: 现在,在不更改绘图代码的情况下,我得到以下输出:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.