簡體   English   中英

在 pandas 如何從觀察值創建新列並從另一列聚合值

[英]In pandas how to create new column from observations and aggregate values from another column

我有這個 dataframe 並且我想將它轉換為另一個 dataframe 的列,該列結合了第一個 dataframe 中的多個列的觀察值並聚合來自列“點”的值。 這是 dataframe 以下是所需的結果:

player_data = pd.DataFrame({"customer_id": ["100001", "100002", "100005", "100006", "100007", "100011", "100012", 
                                            "100013", "100022", "100023", "100025", "100028", "100029", "100030"],
                            "country": ["Austria", "Germany", "Germany", "Sweden", "Sweden", "Austria", "Sweden", 
                                        "Austria", "Germany", "Germany", "Austria", "Austria", "Germany", "Austria"],
                            "category": ["basic", "pro", "basic", "advanced", "pro", "intermidiate", "pro", 
                                         "basic", "intermidiate", "intermidiate", "advanced", "basic", "intermidiate", "basic"],
                            "gender": ["male", "male", "female", "female", "female", "male", "female",
                                       "female", "male", "male", "female", "male", "male", "male"],
                            "age_group": ["20", "30", "20", "30", "40", "20", "40",
                                          "20", "30", "30", "40", "20", "30", "20"],
                            "points": [200, 480, 180, 330, 440, 240, 520, 180, 320, 300, 320, 200, 280, 180]})

新的 dataframe 應該是這樣的:

謝謝你們!

這會是你要找的嗎?

df_new = df.groupby(['country', 'category', 'gender', 'age_group'])['points'].agg('sum').reset_index()
df_new.pivot_table(values = 'points', index = ['country', 'category', 'gender'], columns = 'age_group', fill_value = 0).reset_index().sort_values(['country', 'category', 'gender'])

但是,這不會有任何只有 0 的列,例如澳大利亞 | 高級 | M 不會在這里,因為原始 df 中沒有任何提及。 如果您想動態添加它們,您可能需要重新考慮 df.

嘗試這個:

midx = pd.MultiIndex.from_product([player_data['country'].unique(), 
                                   player_data['category'].unique(), 
                                   player_data['gender'].unique()])
player_data.groupby(['country', 'category', 'gender', 'age_group'])['points']\
           .sum()\
           .unstack(fill_value=0)\
           .reindex(midx, fill_value=0)

Output:

age_group                     20   30   40
Austria basic        male    580    0    0
                     female  180    0    0
        pro          male      0    0    0
                     female    0    0    0
        advanced     male      0    0    0
                     female    0    0  320
        intermidiate male    240    0    0
                     female    0    0    0
Germany basic        male      0    0    0
                     female  180    0    0
        pro          male      0  480    0
                     female    0    0    0
        advanced     male      0    0    0
                     female    0    0    0
        intermidiate male      0  900    0
                     female    0    0    0
Sweden  basic        male      0    0    0
                     female    0    0    0
        pro          male      0    0    0
                     female    0    0  960
        advanced     male      0    0    0
                     female    0  330    0
        intermidiate male      0    0    0
                     female    0    0    0

這行得通。 盡管循環是對零進行排序的一種非常笨拙的方式。

df = player_data.groupby(["country", "category", "gender", "age_group"]).points.sum().reset_index()
df = df.pivot_table(values='points', index=['country', 'category', 'gender'], columns='age_group', fill_value=0)


for country in player_data.country.unique():
    for category in player_data.category.unique():
        for gender in player_data.gender.unique():
            if (country, category, gender) not in df.index:
                df.loc[(country, category, gender)] = np.zeros(len(player_data.age_group.unique()), dtype=int)

df = df.sort_values(['country', 'category', 'gender']).reset_index()

Output:

age_group  country      category  gender   20   30   40
0          Austria      advanced  female    0    0  320
1          Austria      advanced    male    0    0    0
2          Austria         basic  female  180    0    0
3          Austria         basic    male  580    0    0
4          Austria  intermidiate  female    0    0    0
5          Austria  intermidiate    male  240    0    0
6          Austria           pro  female    0    0    0
7          Austria           pro    male    0    0    0
8          Germany      advanced  female    0    0    0
9          Germany      advanced    male    0    0    0
10         Germany         basic  female  180    0    0
11         Germany         basic    male    0    0    0
12         Germany  intermidiate  female    0    0    0
13         Germany  intermidiate    male    0  900    0
14         Germany           pro  female    0    0    0
15         Germany           pro    male    0  480    0
16          Sweden      advanced  female    0  3...

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM