標記從NaN類別分組的Pandas DataFrame創建的條形圖

Question

我創建了一個很好且整潔的分組數據框，然后在一個簡單的seabar barplot中使用該數據。 但是，當我嘗試向標簽添加標簽時，出現以下錯誤：

ValueError：無法將float NaN轉換為整數

我知道這是因為分組類別之一只有一個值（而不是兩個）。 如何將其標記為“ 0”？

我已經整整一天在兔子洞里走了，什么也沒發現。 以下是我嘗試過的方法（以許多不同方式）：

在分組的數據框中插入一行。
使用pd.fillna() 。
創建一個要在labeling子句中應用的函數。

我處理大量經常遇到此類問題的數據，因此我非常感謝您為解決此問題提供的幫助。 似乎很簡單。 我想念什么？ 謝謝！

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# my initial data set 
d = {'year' : [2014,2014,2014,2015,2015,],
     'status' : ["n","y","n","n","n"],
     'num' : [1,1,1,1,1]}
df = pd.DataFrame(d)

# groupby to create another dataframe
df2 = (df["status"]
    .groupby(df["year"])
    .value_counts(normalize=True)
    .rename("Percent")
    .apply(lambda x: x*100)
    .reset_index())

# create my bar plot
f = plt.figure(figsize = (11,8.5))

ax1 = plt.subplot(2,2,1)
sns.barplot(x="year",
           y="Percent",
           hue="status",
           hue_order = ["n","y"],
           data=df2,
           ci = None)

# label the bars
for p in ax1.patches:
    ax1.text(p.get_x() + p.get_width()/2., p.get_height(), '%d%%' % round(p.get_height()), 
        fontsize=10, color='red', ha='center', va='bottom')

plt.show()

Answer 1

如果p.get_height()返回NaN，則p.get_height()通過將高度設置為零來處理空條情況：

for p in ax1.patches:
    height = p.get_height()
    if np.isnan(height):
        height = 0
    ax1.text(p.get_x() + p.get_width()/2., height, '%d%%' % round(height), 
        fontsize=10, color='red', ha='center', va='bottom')

給我

另外，您可以擴展框架以確保其中的值為零：

non_data_cols = df2.columns.drop("Percent")
full_index = pd.MultiIndex.from_product([df[col].unique() for col in non_data_cols], names=non_data_cols)
df2 = df2.set_index(non_data_cols.tolist()).reindex(full_index).fillna(0).reset_index()

擴展給我

In [74]: df2
Out[74]: 
   year status     Percent
0  2014      n   66.666667
1  2014      y   33.333333
2  2015      n  100.000000
3  2015      y    0.000000

Answer 2

當您處理缺少類別的數據時，可以采用的常見技巧是堆疊和堆疊數據。 總體思路可以從這個答案中看出。 格式化數據后，您就可以使用填充值（在這種情況下為0）進行填充，並保持代碼fillna 。

您所要做的就是用以下代碼替換當前創建的df2 。

df2 = (df.groupby('year').status.value_counts(normalize=True).mul(100)
          .unstack().stack(dropna=False).fillna(0)
          .rename('Percent').reset_index())

這給了我們：

   year status     Percent
0  2014      n   66.666667
1  2014      y   33.333333
2  2015      n  100.000000
3  2015      y    0.000000

現在，在不更改繪圖代碼的情況下，我得到以下輸出：

標記從NaN類別分組的Pandas DataFrame創建的條形圖

問題描述

2 個解決方案

解決方案1
5 2018-09-25 17:07:28

解決方案2
1 已采納 2018-09-25 17:32:29

標記從NaN類別分組的Pandas DataFrame創建的條形圖

問題描述

2 個解決方案

解決方案1 5 2018-09-25 17:07:28

解決方案2 1 已采納 2018-09-25 17:32:29

解決方案1
5 2018-09-25 17:07:28

解決方案2
1 已采納 2018-09-25 17:32:29