Python + Pandas +數據可視化：如何獲取每一行的百分比並可視化分類數據？

Question

我正在對貸款預測數據集（熊貓數據框）進行探索性數據分析。 該數據框有兩列：Property_Area，其值分為三種類型：農村，城市，Semiurban。 另一列是Loan_Status明智的值有兩種類型：Y，N。我想繪制如下圖：在X軸上應該有Property_Area，對於每種類型的3個區域，我想顯示接受的貸款百分比或沿Y軸拒絕。 怎么做？

這是我的數據的示例：

data = pd.DataFrame({'Loan_Status':['N','Y','Y','Y','Y','N','N','Y','N','Y','N'], 
       'Property_Area': ['Rural', 'Urban','Urban','Urban','Urban','Urban',
       'Semiurban','Urban','Semiurban','Rural','Semiurban']})

我嘗試了這個：

status = data['Loan_Status']
index = data['Property_Area']
df = pd.DataFrame({'Loan Status' : status}, index=index)
ax = df.plot.bar(rot=0)

data is the dataframe for the original dataset

輸出：

編輯：我能夠做我想做的，但是為此，我不得不寫一個長代碼：

new_data = data[['Property_Area', 'Loan_Status']].copy()
count_rural_y = new_data[(new_data.Property_Area == 'Rural') & (data.Loan_Status == 'Y') ].count()
count_rural = new_data[(new_data.Property_Area == 'Rural')].count()
#print(count_rural[0])
#print(count_rural_y[0])
rural_y_percent = (count_rural_y[0]/count_rural[0])*100
#print(rural_y_percent)

#print("-"*50)

count_urban_y = new_data[(new_data.Property_Area == 'Urban') & (data.Loan_Status == 'Y') ].count()
count_urban = new_data[(new_data.Property_Area == 'Urban')].count()
#print(count_urban[0])
#print(count_urban_y[0])
urban_y_percent = (count_urban_y[0]/count_urban[0])*100
#print(urban_y_percent)

#print("-"*50)

count_semiurban_y = new_data[(new_data.Property_Area == 'Semiurban') & (data.Loan_Status == 'Y') ].count()
count_semiurban = new_data[(new_data.Property_Area == 'Semiurban')].count()
#print(count_semiurban[0])
#print(count_semiurban_y[0])
semiurban_y_percent = (count_semiurban_y[0]/count_semiurban[0])*100
#print(semiurban_y_percent)

#print("-"*50)

objects = ('Rural', 'Urban', 'Semiurban')
y_pos = np.arange(len(objects))
performance = [rural_y_percent,urban_y_percent,semiurban_y_percent]
plt.bar(y_pos, performance, align='center', alpha=0.5)
plt.xticks(y_pos, objects)
plt.ylabel('Loan Approval Percentage')
plt.title('Area Wise Loan Approval Percentage')

plt.show()

輸出：

如果可以的話，能否請您建議我一個更簡單的方法？

Answer 1

具有`normalize` Pandas `Crosstabs`可以簡化此過程

在pandas數據框中獲取2+列並獲取每一行百分比的一種簡單方法是將pandas crosstab函數與normalize = 'index'

交叉表函數的查找方式如下：

# Crosstab with "normalize = 'index'". 
df_percent = pd.crosstab(data.Property_Area,data.Loan_Status,
                         normalize = 'index').rename_axis(None)

# Multiply all percentages by 100 for graphing. 
df_percent *= 100

這將輸出df_percent ：

Loan_Status          N          Y
Rural        50.000000  50.000000
Semiurban    66.666667  33.333333
Urban        16.666667  83.333333

然后，您可以輕松地將其繪制到您的圖表中：

# Plot only approvals as bar graph. 
plt.bar(df_percent.index, df_percent.Y, align='center', alpha=0.5)
plt.ylabel('Loan Approval Percentage')
plt.title('Area Wise Loan Approval Percentage')

plt.show()

並得到結果圖：

在這里您可以看到在Google Colab中運行的代碼

這是我為此答案生成的示例數據框：

data = pd.DataFrame({'Loan_Status':['N','Y','Y','Y','Y','N','N','Y','N','Y','Y'
   ], 'Property_Area': ['Rural', 'Urban','Urban','Urban','Urban','Urban',
   'Semiurban','Urban','Semiurban','Rural','Semiurban']})

創建以下示例數據框：

   Loan_Status Property_Area
0            N         Rural
1            Y         Urban
2            Y         Urban
3            Y         Urban
4            Y         Urban
5            N         Urban
6            N     Semiurban
7            Y         Urban
8            N     Semiurban
9            Y         Rural
10           Y     Semiurban

Python + Pandas +數據可視化：如何獲取每一行的百分比並可視化分類數據？

問題描述

1 個解決方案

解決方案1
0 2018-11-17 19:03:45

具有`normalize` Pandas `Crosstabs`可以簡化此過程

Python + Pandas +數據可視化：如何獲取每一行的百分比並可視化分類數據？

問題描述

1 個解決方案

解決方案1 0 2018-11-17 19:03:45

具有normalize Pandas Crosstabs可以簡化此過程

解決方案1
0 2018-11-17 19:03:45

具有`normalize` Pandas `Crosstabs`可以簡化此過程