如何使用 Pandas 聚合组指标并绘制数据

Question

I want to have a pie chart that compares survived people's age groups.我想要一个饼图来比较幸存者的年龄组。 The problem is I don't know how to count people with the same age.问题是我不知道如何计算同龄人。 As you see in the bottom of screenshot, it says 142 columns.正如您在屏幕截图底部看到的，它显示 142 列。 But, there are 891 people in the dataset.但是，数据集中有 891 人。

import pandas as pd
import seaborn as sns  # for test data only

# load test data from seaborn
df_t = sns.load_dataset('titanic')

# capitalize the column headers to match code used below
df_t.columns = df_t.columns.str.title()

dft = df_t.groupby(['Age', 'Survived']).size().reset_index(name='count')

def get_num_people_by_age_category(dft):
    dft["age_group"] = pd.cut(x=dft['Age'], bins=[0,18,60,100], labels=["young","middle_aged","old"])
    return dft

# Call function
dft = get_num_people_by_age_category(dft)
print(dft)

output输出

Answer 1

Calling df_t.groupby(['Age', 'Survived']).size().reset_index(name='count') creates a dataframe with one line per age and per survived status.调用df_t.groupby(['Age', 'Survived']).size().reset_index(name='count')创建一个数据df_t.groupby(['Age', 'Survived']).size().reset_index(name='count') ，每个年龄和每个幸存状态一行。

To get the counts per age group, an "age group" column can be added to the original dataframe.要获得每个年龄组的计数，可以将“年龄组”列添加到原始数据框中。 And in a next step, groupby can use that "age group".在下一步中， groupby可以使用该“年龄组”。

from matplotlib import pyplot as plt
import seaborn as sns  # to load the titanic dataset
import pandas as pd

df_t = sns.load_dataset('titanic')
df_t["age_group"] = pd.cut(x=df_t['age'], bins=[0, 18, 60, 100], labels=["young", "middle aged", "old"])

df_per_age = df_t.groupby(['age_group', 'survived']).size().reset_index(name='count')
labels = [f'{age_group},\n {"survived" if survived == 1 else "not survived"}'
          for age_group, survived in df_per_age[['age_group', 'survived']].values]
labels[-1] = labels[-1].replace('\n', ' ') # remove newline for the last items as the wedges are too thin
labels[-2] = labels[-2].replace('\n', ' ')
plt.pie(df_per_age['count'], labels=labels)
plt.tight_layout()
plt.show()

Answer 2

The answer from @JohanC is great for a pie chart @JohanC的答案非常适合制作饼图
I think the data is better presented as a bar plot, so this is an alternative, which can be done with pandas.DataFrame.plot and kind='bar' .我认为数据最好用条形图表示，所以这是一种替代方法，可以用pandas.DataFrame.plot和kind='bar'来完成。
Reshape the data with pandas.crosstab , which creates a frequency cross tabulation table between the two factors.使用pandas.crosstab重塑数据，这会在两个因素之间创建频率交叉表。
Optionally include bar annotations using matplotlib.pyplot.bar_label可以选择使用matplotlib.pyplot.bar_label包含条形注释
- See this answer for additional details about this method.有关此方法的其他详细信息，请参阅此答案。

import pandas as pd
import seaborn as sns

# load data
df = sns.load_dataset('titanic')
df.columns = df.columns.str.title()

# map 0 and 1 of Survived to a string
df.Survived = df.Survived.map({0: 'Died', 1: 'Survived'})

# bin the age
df['Age Group'] = pd.cut(x=df['Age'], bins=[0, 18, 60, 100], labels=['Young', 'Middle Aged', 'Senior'])

# Calculate the counts
ct = pd.crosstab(df['Survived'], df['Age Group'])

# display(ct)
Age Group  Young  Middle Aged  Senior
Survived                             
Died          69          338      17
Survived      70          215       5

# plot
ax = ct.plot(kind='bar', rot=0, xlabel='')

# optionally add annotations
for c in ax.containers:
    ax.bar_label(c, label_type='edge')
    
# pad the spacing between the number and the edge of the figure
ax.margins(y=0.1)

如何使用 Pandas 聚合组指标并绘制数据

问题描述

2 个解决方案

解决方案1
2 2021-11-08 17:57:19

解决方案2
2 2021-11-08 18:27:45

如何使用 Pandas 聚合组指标并绘制数据

问题描述

2 个解决方案

解决方案1 2 2021-11-08 17:57:19

解决方案2 2 2021-11-08 18:27:45

解决方案1
2 2021-11-08 17:57:19

解决方案2
2 2021-11-08 18:27:45