[英]In pandas how to create new column from observations and aggregate values from another column
I have this dataframe and I want to transform it into another dataframe with a column which combines observations from several columns in the first dataframe and aggregates values from the column "points".我有这个 dataframe 并且我想将它转换为另一个 dataframe 的列,该列结合了第一个 dataframe 中的多个列的观察值并聚合来自列“点”的值。 Here's the dataframe and below is the desired result:
这是 dataframe 以下是所需的结果:
player_data = pd.DataFrame({"customer_id": ["100001", "100002", "100005", "100006", "100007", "100011", "100012",
"100013", "100022", "100023", "100025", "100028", "100029", "100030"],
"country": ["Austria", "Germany", "Germany", "Sweden", "Sweden", "Austria", "Sweden",
"Austria", "Germany", "Germany", "Austria", "Austria", "Germany", "Austria"],
"category": ["basic", "pro", "basic", "advanced", "pro", "intermidiate", "pro",
"basic", "intermidiate", "intermidiate", "advanced", "basic", "intermidiate", "basic"],
"gender": ["male", "male", "female", "female", "female", "male", "female",
"female", "male", "male", "female", "male", "male", "male"],
"age_group": ["20", "30", "20", "30", "40", "20", "40",
"20", "30", "30", "40", "20", "30", "20"],
"points": [200, 480, 180, 330, 440, 240, 520, 180, 320, 300, 320, 200, 280, 180]})
The new dataframe is supposed to look like this:新的 dataframe 应该是这样的:
Thank you all!谢谢你们!
Would this be what you are looking for?这会是你要找的吗?
df_new = df.groupby(['country', 'category', 'gender', 'age_group'])['points'].agg('sum').reset_index()
df_new.pivot_table(values = 'points', index = ['country', 'category', 'gender'], columns = 'age_group', fill_value = 0).reset_index().sort_values(['country', 'category', 'gender'])
However, this will not have any columns that have only 0s for example Australia |但是,这不会有任何只有 0 的列,例如澳大利亚 | Advanced |
高级 | M will not be in here since there wasn't any mention in it for the original df.
M 不会在这里,因为原始 df 中没有任何提及。 If you wanted to dynamically add them you might need to rethink the structure of your df.
如果您想动态添加它们,您可能需要重新考虑 df.
Try this:尝试这个:
midx = pd.MultiIndex.from_product([player_data['country'].unique(),
player_data['category'].unique(),
player_data['gender'].unique()])
player_data.groupby(['country', 'category', 'gender', 'age_group'])['points']\
.sum()\
.unstack(fill_value=0)\
.reindex(midx, fill_value=0)
Output: Output:
age_group 20 30 40
Austria basic male 580 0 0
female 180 0 0
pro male 0 0 0
female 0 0 0
advanced male 0 0 0
female 0 0 320
intermidiate male 240 0 0
female 0 0 0
Germany basic male 0 0 0
female 180 0 0
pro male 0 480 0
female 0 0 0
advanced male 0 0 0
female 0 0 0
intermidiate male 0 900 0
female 0 0 0
Sweden basic male 0 0 0
female 0 0 0
pro male 0 0 0
female 0 0 960
advanced male 0 0 0
female 0 330 0
intermidiate male 0 0 0
female 0 0 0
This works.这行得通。 Although the loops are a pretty janky way of sorting the zeros.
尽管循环是对零进行排序的一种非常笨拙的方式。
df = player_data.groupby(["country", "category", "gender", "age_group"]).points.sum().reset_index()
df = df.pivot_table(values='points', index=['country', 'category', 'gender'], columns='age_group', fill_value=0)
for country in player_data.country.unique():
for category in player_data.category.unique():
for gender in player_data.gender.unique():
if (country, category, gender) not in df.index:
df.loc[(country, category, gender)] = np.zeros(len(player_data.age_group.unique()), dtype=int)
df = df.sort_values(['country', 'category', 'gender']).reset_index()
Output: Output:
age_group country category gender 20 30 40
0 Austria advanced female 0 0 320
1 Austria advanced male 0 0 0
2 Austria basic female 180 0 0
3 Austria basic male 580 0 0
4 Austria intermidiate female 0 0 0
5 Austria intermidiate male 240 0 0
6 Austria pro female 0 0 0
7 Austria pro male 0 0 0
8 Germany advanced female 0 0 0
9 Germany advanced male 0 0 0
10 Germany basic female 180 0 0
11 Germany basic male 0 0 0
12 Germany intermidiate female 0 0 0
13 Germany intermidiate male 0 900 0
14 Germany pro female 0 0 0
15 Germany pro male 0 480 0
16 Sweden advanced female 0 3...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.