[英]how to create new column names from another column all values and agg by another column in pandas dataframe?
[英]In pandas how to create new column from observations and aggregate values from another column
我有这个 dataframe 并且我想将它转换为另一个 dataframe 的列,该列结合了第一个 dataframe 中的多个列的观察值并聚合来自列“点”的值。 这是 dataframe 以下是所需的结果:
player_data = pd.DataFrame({"customer_id": ["100001", "100002", "100005", "100006", "100007", "100011", "100012",
"100013", "100022", "100023", "100025", "100028", "100029", "100030"],
"country": ["Austria", "Germany", "Germany", "Sweden", "Sweden", "Austria", "Sweden",
"Austria", "Germany", "Germany", "Austria", "Austria", "Germany", "Austria"],
"category": ["basic", "pro", "basic", "advanced", "pro", "intermidiate", "pro",
"basic", "intermidiate", "intermidiate", "advanced", "basic", "intermidiate", "basic"],
"gender": ["male", "male", "female", "female", "female", "male", "female",
"female", "male", "male", "female", "male", "male", "male"],
"age_group": ["20", "30", "20", "30", "40", "20", "40",
"20", "30", "30", "40", "20", "30", "20"],
"points": [200, 480, 180, 330, 440, 240, 520, 180, 320, 300, 320, 200, 280, 180]})
谢谢你们!
这会是你要找的吗?
df_new = df.groupby(['country', 'category', 'gender', 'age_group'])['points'].agg('sum').reset_index()
df_new.pivot_table(values = 'points', index = ['country', 'category', 'gender'], columns = 'age_group', fill_value = 0).reset_index().sort_values(['country', 'category', 'gender'])
但是,这不会有任何只有 0 的列,例如澳大利亚 | 高级 | M 不会在这里,因为原始 df 中没有任何提及。 如果您想动态添加它们,您可能需要重新考虑 df.
尝试这个:
midx = pd.MultiIndex.from_product([player_data['country'].unique(),
player_data['category'].unique(),
player_data['gender'].unique()])
player_data.groupby(['country', 'category', 'gender', 'age_group'])['points']\
.sum()\
.unstack(fill_value=0)\
.reindex(midx, fill_value=0)
Output:
age_group 20 30 40
Austria basic male 580 0 0
female 180 0 0
pro male 0 0 0
female 0 0 0
advanced male 0 0 0
female 0 0 320
intermidiate male 240 0 0
female 0 0 0
Germany basic male 0 0 0
female 180 0 0
pro male 0 480 0
female 0 0 0
advanced male 0 0 0
female 0 0 0
intermidiate male 0 900 0
female 0 0 0
Sweden basic male 0 0 0
female 0 0 0
pro male 0 0 0
female 0 0 960
advanced male 0 0 0
female 0 330 0
intermidiate male 0 0 0
female 0 0 0
这行得通。 尽管循环是对零进行排序的一种非常笨拙的方式。
df = player_data.groupby(["country", "category", "gender", "age_group"]).points.sum().reset_index()
df = df.pivot_table(values='points', index=['country', 'category', 'gender'], columns='age_group', fill_value=0)
for country in player_data.country.unique():
for category in player_data.category.unique():
for gender in player_data.gender.unique():
if (country, category, gender) not in df.index:
df.loc[(country, category, gender)] = np.zeros(len(player_data.age_group.unique()), dtype=int)
df = df.sort_values(['country', 'category', 'gender']).reset_index()
Output:
age_group country category gender 20 30 40
0 Austria advanced female 0 0 320
1 Austria advanced male 0 0 0
2 Austria basic female 180 0 0
3 Austria basic male 580 0 0
4 Austria intermidiate female 0 0 0
5 Austria intermidiate male 240 0 0
6 Austria pro female 0 0 0
7 Austria pro male 0 0 0
8 Germany advanced female 0 0 0
9 Germany advanced male 0 0 0
10 Germany basic female 180 0 0
11 Germany basic male 0 0 0
12 Germany intermidiate female 0 0 0
13 Germany intermidiate male 0 900 0
14 Germany pro female 0 0 0
15 Germany pro male 0 480 0
16 Sweden advanced female 0 3...
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.