[英]Plot groupby percentage dataframe
我沒有找到我想要做的完整答案:
我有一個數據框。 我想按用戶和他們對調查的答案進行分組,總結他們所有的好答案/他們的答案總數,以百分比顯示並繪制結果。
我有一個答案列,其中包含:1,0 或 -1。 我想過濾它以排除-1。
這是我到目前為止所做的:
df_sample.groupby('user').filter(lambda x : x['answer'].mean() >-1)
或者 :
a = df_sample.loc[df_sample['answer']!=-1,['user','answer']]
b = a.groupby(['user','answer']).agg({'answer' : 'sum'})
看不全。 感謝您提出的任何建議。
這是一個示例解決方案,假設您想根據過濾后的數據框計算百分比。
import pandas as pd
import numpy as np
df_sample = pd.DataFrame(np.random.randint(-1,2,size=(10, 1)), columns=['answer'])
df_sample['user'] = [i for i in 'a b c d e f a b c d'.split(' ')]
df_filtered = df_sample[df_sample.answer>-1]
print(df_filtered.groupby('user').agg({'answer' : lambda x: x.sum()/len(df_filtered)*100}))
讓我們嘗試一些示例數據:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
np.random.seed(5)
n = 100
df = pd.DataFrame({'user': np.random.choice(list("ABCD"), size=n),
'answer': np.random.choice([1, 0, -1], size=n)})
df.head()
:
user answer
0 D 1
1 C 0
2 D -1
3 B 1
4 C 1
過濾掉-1
值並使用命名聚合來獲得“好答案”和“總答案”:
plot_df = df[df['answer'].ne(-1)].groupby('user').aggregate(
good_answer=('answer', 'sum'),
total_answer=('answer', 'size')
)
plot_df
:
good_answer total_answer
user
A 9 15
B 11 20
C 15 19
D 7 14
使用除法和乘法得到百分比:
plot_df['pct'] = (plot_df['good_answer'] / plot_df['total_answer'] * 100)
plot_df
:
good_answer total_answer pct
user
A 9 15 60.000000
B 11 20 55.000000
C 15 19 78.947368
D 7 14 50.000000
然后這可以用DataFrame.plot
繪制:
ax = plot_df.plot(
y='pct', kind='bar', rot=0,
title='Percentage of Good Answers',
ylim=[0, 100],
label='Percent Good'
)
# Add Labels on Top of Bars
for container in ax.containers:
ax.bar_label(container, fmt='%.2f%%')
plt.show()
如果只需要百分比, groupby mean
可用於在過濾掉-1
秒后直接獲得結果圖:
plot_df = df[df['answer'].ne(-1)].groupby('user')['answer'].mean().mul(100)
ax = plot_df.plot(
kind='bar', rot=0,
title='Percentage of Good Answers',
ylim=[0, 100],
label='Percent Good'
)
# Add Labels on Top of Bars
for container in ax.containers:
ax.bar_label(container, fmt='%.2f%%')
plt.show()
plot_df
:
answer
user
A 60.000000
B 55.000000
C 78.947368
D 50.000000
兩個選項都產生:
全部一起:
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
np.random.seed(5)
n = 100
df = pd.DataFrame({'user': np.random.choice(list("ABCD"), size=n),
'answer': np.random.choice([1, 0, -1], size=n)})
plot_df = df[df['answer'].ne(-1)].groupby('user').aggregate(
good_answer=('answer', 'sum'),
total_answer=('answer', 'size')
)
plot_df['pct'] = (plot_df['good_answer'] / plot_df['total_answer'] * 100)
ax = plot_df.plot(
y='pct', kind='bar', rot=0,
title='Percentage of Good Answers',
ylim=[0, 100],
label='Percent Good'
)
# Add Labels on Top of Bars
for container in ax.containers:
ax.bar_label(container, fmt='%.2f%%')
plt.show()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.