简体   繁体   English

在 pandas 如何从观察值创建新列并从另一列聚合值

[英]In pandas how to create new column from observations and aggregate values from another column

I have this dataframe and I want to transform it into another dataframe with a column which combines observations from several columns in the first dataframe and aggregates values from the column "points".我有这个 dataframe 并且我想将它转换为另一个 dataframe 的列,该列结合了第一个 dataframe 中的多个列的观察值并聚合来自列“点”的值。 Here's the dataframe and below is the desired result:这是 dataframe 以下是所需的结果:

player_data = pd.DataFrame({"customer_id": ["100001", "100002", "100005", "100006", "100007", "100011", "100012", 
                                            "100013", "100022", "100023", "100025", "100028", "100029", "100030"],
                            "country": ["Austria", "Germany", "Germany", "Sweden", "Sweden", "Austria", "Sweden", 
                                        "Austria", "Germany", "Germany", "Austria", "Austria", "Germany", "Austria"],
                            "category": ["basic", "pro", "basic", "advanced", "pro", "intermidiate", "pro", 
                                         "basic", "intermidiate", "intermidiate", "advanced", "basic", "intermidiate", "basic"],
                            "gender": ["male", "male", "female", "female", "female", "male", "female",
                                       "female", "male", "male", "female", "male", "male", "male"],
                            "age_group": ["20", "30", "20", "30", "40", "20", "40",
                                          "20", "30", "30", "40", "20", "30", "20"],
                            "points": [200, 480, 180, 330, 440, 240, 520, 180, 320, 300, 320, 200, 280, 180]})

The new dataframe is supposed to look like this:新的 dataframe 应该是这样的:

Thank you all!谢谢你们!

Would this be what you are looking for?这会是你要找的吗?

df_new = df.groupby(['country', 'category', 'gender', 'age_group'])['points'].agg('sum').reset_index()
df_new.pivot_table(values = 'points', index = ['country', 'category', 'gender'], columns = 'age_group', fill_value = 0).reset_index().sort_values(['country', 'category', 'gender'])

However, this will not have any columns that have only 0s for example Australia |但是,这不会有任何只有 0 的列,例如澳大利亚 | Advanced |高级 | M will not be in here since there wasn't any mention in it for the original df. M 不会在这里,因为原始 df 中没有任何提及。 If you wanted to dynamically add them you might need to rethink the structure of your df.如果您想动态添加它们,您可能需要重新考虑 df.

Try this:尝试这个:

midx = pd.MultiIndex.from_product([player_data['country'].unique(), 
                                   player_data['category'].unique(), 
                                   player_data['gender'].unique()])
player_data.groupby(['country', 'category', 'gender', 'age_group'])['points']\
           .sum()\
           .unstack(fill_value=0)\
           .reindex(midx, fill_value=0)

Output: Output:

age_group                     20   30   40
Austria basic        male    580    0    0
                     female  180    0    0
        pro          male      0    0    0
                     female    0    0    0
        advanced     male      0    0    0
                     female    0    0  320
        intermidiate male    240    0    0
                     female    0    0    0
Germany basic        male      0    0    0
                     female  180    0    0
        pro          male      0  480    0
                     female    0    0    0
        advanced     male      0    0    0
                     female    0    0    0
        intermidiate male      0  900    0
                     female    0    0    0
Sweden  basic        male      0    0    0
                     female    0    0    0
        pro          male      0    0    0
                     female    0    0  960
        advanced     male      0    0    0
                     female    0  330    0
        intermidiate male      0    0    0
                     female    0    0    0

This works.这行得通。 Although the loops are a pretty janky way of sorting the zeros.尽管循环是对零进行排序的一种非常笨拙的方式。

df = player_data.groupby(["country", "category", "gender", "age_group"]).points.sum().reset_index()
df = df.pivot_table(values='points', index=['country', 'category', 'gender'], columns='age_group', fill_value=0)


for country in player_data.country.unique():
    for category in player_data.category.unique():
        for gender in player_data.gender.unique():
            if (country, category, gender) not in df.index:
                df.loc[(country, category, gender)] = np.zeros(len(player_data.age_group.unique()), dtype=int)

df = df.sort_values(['country', 'category', 'gender']).reset_index()

Output: Output:

age_group  country      category  gender   20   30   40
0          Austria      advanced  female    0    0  320
1          Austria      advanced    male    0    0    0
2          Austria         basic  female  180    0    0
3          Austria         basic    male  580    0    0
4          Austria  intermidiate  female    0    0    0
5          Austria  intermidiate    male  240    0    0
6          Austria           pro  female    0    0    0
7          Austria           pro    male    0    0    0
8          Germany      advanced  female    0    0    0
9          Germany      advanced    male    0    0    0
10         Germany         basic  female  180    0    0
11         Germany         basic    male    0    0    0
12         Germany  intermidiate  female    0    0    0
13         Germany  intermidiate    male    0  900    0
14         Germany           pro  female    0    0    0
15         Germany           pro    male    0  480    0
16          Sweden      advanced  female    0  3...

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从另一列的所有值创建新的列名并按 pandas dataframe 中的另一列创建新列名? - how to create new column names from another column all values and agg by another column in pandas dataframe? 如何基于另一列的值在pandas dataframe列中创建新值 - How to create new values in a pandas dataframe column based on values from another column 熊猫-创建一个新列,并在另一列中填充观察值 - Pandas- Create a new column filled with the number of observations in another column 如何创建一个新的数据框列,并从另一个列中移出值? - How to create a new dataframe column with shifted values from another column? 在熊猫中,如何根据条件从另一个部分中创建一个新列? - In pandas how to create a new column from part of another, obeying a condition? 如何从 pandas 数据框的列值创建新行 - How to create a new rows from column values of pandas data frame 合并和聚合列值并在 Pandas 中创建新列 label - Combine and aggregate column values and create new column label in Pandas 汇总 pandas 中列的观察值 - Summing observations from column in pandas 在 Pandas 数据框中创建一个新的列表列,其中包含来自另一列的唯一值 - Create a new column of lists in Pandas dataframe with unique values from another column Pandas - 根据 str 包含从另一列创建带有值的新列 - Pandas - Create new column w/values from another column based on str contains
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM