I create a nice and tidy grouped data frame and then I use that data in a simple seaborn barplot. However, when I try to add labels to the bars, I get the following error:
ValueError: cannot convert float NaN to integer
I know this is because there is only one value (instead of two) for one of the grouped categories. How do I get it to label it "0"?
I've gone down the rabbit hole on this for a full day and haven't found anything. Here are the things that I've tried (in many different ways):
pd.fillna()
. I work with a lot of data that frequently encounters this sort of problem, so I would really appreciate some help in solving this. It seems so simple. What am I missing? Thanks!
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# my initial data set
d = {'year' : [2014,2014,2014,2015,2015,],
'status' : ["n","y","n","n","n"],
'num' : [1,1,1,1,1]}
df = pd.DataFrame(d)
# groupby to create another dataframe
df2 = (df["status"]
.groupby(df["year"])
.value_counts(normalize=True)
.rename("Percent")
.apply(lambda x: x*100)
.reset_index())
# create my bar plot
f = plt.figure(figsize = (11,8.5))
ax1 = plt.subplot(2,2,1)
sns.barplot(x="year",
y="Percent",
hue="status",
hue_order = ["n","y"],
data=df2,
ci = None)
# label the bars
for p in ax1.patches:
ax1.text(p.get_x() + p.get_width()/2., p.get_height(), '%d%%' % round(p.get_height()),
fontsize=10, color='red', ha='center', va='bottom')
plt.show()
You could handle the empty-bar case by setting the height to zero if p.get_height()
returns NaN:
for p in ax1.patches:
height = p.get_height()
if np.isnan(height):
height = 0
ax1.text(p.get_x() + p.get_width()/2., height, '%d%%' % round(height),
fontsize=10, color='red', ha='center', va='bottom')
gives me
Alternatively, you could expand your frame to ensure there's a zero there:
non_data_cols = df2.columns.drop("Percent")
full_index = pd.MultiIndex.from_product([df[col].unique() for col in non_data_cols], names=non_data_cols)
df2 = df2.set_index(non_data_cols.tolist()).reindex(full_index).fillna(0).reset_index()
which expands to give me
In [74]: df2
Out[74]:
year status Percent
0 2014 n 66.666667
1 2014 y 33.333333
2 2015 n 100.000000
3 2015 y 0.000000
When dealing with data where you have missing categories, a common trick that can be employed is stacking and unstacking the data. The general idea can be viewed in this answer . Once the data is formatted, you are able to fillna
with your fill value (in this case 0), and leave your code as is.
All you have to do is replace your current creation of df2
with the below code.
df2 = (df.groupby('year').status.value_counts(normalize=True).mul(100)
.unstack().stack(dropna=False).fillna(0)
.rename('Percent').reset_index())
Which gives us:
year status Percent
0 2014 n 66.666667
1 2014 y 33.333333
2 2015 n 100.000000
3 2015 y 0.000000
Now, with no changes to your plotting code, I get this output:
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.