当总数不是 100% 时如何计算百分比？

Question

I'm very new to pandas and matplotlib.我对熊猫和 matplotlib 很陌生。

I have applied an questionnaire, and in a certain question people were asked the social networks they use.我申请了一份问卷，在某个问题中，人们被问到他们使用的社交网络。 Options were Facebook, Instagram, Twitter, and others.选项包括 Facebook、Instagram、Twitter 等。 They could select more than an option.他们可以选择的不仅仅是一个选项。

I want to organize this data to plot a bar chart.我想组织这些数据来绘制条形图。 I have used the following code:我使用了以下代码：

listsocial = df["SocialNetworks"].str.split(', ', expand=True)

listsocial.head()

listsocial = 100*listsocial.stack().value_counts(normalize=True)

and then:进而：

sns.set(font_scale=1.4)

ax = listsocial.plot(kind='bar', figsize=(15,7), color=('#009C3B'), grid=True)
ax.yaxis.set_major_formatter(mtick.PercentFormatter(decimals=False))
plt.xticks(rotation=80)
plt.suptitle('Most used social networks', fontsize=20)
plt.xlabel('Social network', fontsize=14, labelpad=20)
plt.ylabel('Respondents\n(%)', fontsize=14, labelpad=20)

plt.show()

However, the result does not take in account the fact people could answer more than an option, thus the total should not be not 100%.但是，结果没有考虑到人们可以回答多个选项这一事实，因此总数应该不是 100%。 I want the chart to display data like: 70% uses Facebook, 60% uses Instagram, etc.我希望图表显示如下数据：70% 使用 Facebook，60% 使用 Instagram，等等。

Thanks in advance.提前致谢。

Answer 1

Splitting and stacking is not the way to go in this case.在这种情况下，拆分和堆叠不是要走的路。

I would create separate columns for each social network of interest and assign True if it is included in the string (a sort of one-hot encoder)我会为每个感兴趣的社交网络创建单独的列，如果它包含在字符串中（一种单热编码器），则分配True

social_networks = pd.DataFrame()
for sn in ['Facebook', 'Twitter', ...]:
    social_networks[sn] = df['SocialNetworks'].str.contains(sn)

Then you can get the percentage with然后你可以得到百分比

social_networks = social_networks.mean()

Answer 2

Instead of calling value_counts(normalize=True) you could divide by the number of rows:您可以除以行数，而不是调用value_counts(normalize=True) ：

from matplotlib import pyplot as plt
from matplotlib import ticker as mtick
import numpy as np
import pandas as pd
import seaborn as sns

networks = np.array(['facebook', 'twitter', 'instagram', 'other'])
socnetw = [", ".join(networks[np.random.randint(0, 2, 4, dtype=bool)]) for _ in range(100)]
df = pd.DataFrame({"SocialNetworks": socnetw})

listsocial = df["SocialNetworks"].str.split(', ', expand=True)
listsocial = 100 * listsocial.stack().value_counts() / len(listsocial)
listsocial = listsocial.iloc[:-1] # remove the last row (which contains the count for 'None')

sns.set(font_scale=1.4)

ax = listsocial.plot(kind='bar', figsize=(15, 7), color=('#009C3B'), grid=True)
ax.yaxis.set_major_formatter(mtick.PercentFormatter(decimals=False))
plt.xticks(rotation=80)
plt.suptitle('Most used social networks', fontsize=20)
plt.xlabel('Social network', fontsize=14, labelpad=20)
plt.ylabel('Respondents (%)', fontsize=14, labelpad=20)
plt.tight_layout()
plt.show()

当总数不是 100% 时如何计算百分比？

问题描述

2 个解决方案

解决方案1
0 2020-08-31 23:58:10

解决方案2
0 已采纳 2020-09-01 00:14:51

当总数不是 100% 时如何计算百分比？

问题描述

2 个解决方案

解决方案1 0 2020-08-31 23:58:10

解决方案2 0 已采纳 2020-09-01 00:14:51

解决方案1
0 2020-08-31 23:58:10

解决方案2
0 已采纳 2020-09-01 00:14:51