使用熊貓重新排列表格

Question

我有一張包含客戶 ID 和電子郵件的表格。 一些用戶有多個電子郵件。 該表看起來像這樣：

| Customer  | Email          |
| ----------| -------------- |
| 1         | jdoe@mail.com  |
| 2         | jane1@mail.com |
| 3         | adam@mail.com  |
| 1         | john_d@mail.com|

我想要做的是重新排列表格，使每個客戶 ID 只有一行，並將輔助電子郵件添加為附加列。 像這樣的東西：

| Customer  | Email1         |Email2         |
| ----------| -------------- |---------------|
| 1         | jdoe@mail.com  |john_d@mail.com
| 2         | jane1@mail.com |               |
| 3         | adam@mail.com  |               |

使用熊貓做到這一點的最佳方法是什么？ 我試過使用 df.pivot 但這似乎對我不起作用。

Answer 1

您可以使用Series.duplicated() + pd.merge() + DataFrame.drop_duplicates()

# We get the Customers with more than one email.
df_seconds_email = df[df['Customer'].duplicated()]

# We merge your original dataframe (I called it 'df') and the above one, suffixes param help us to get
# 'Email2' column, finally we drop duplicates taking into account 'Customer' column.
df = pd.merge(df, df_seconds_email, how='left', on=['Customer'], suffixes=('', '2')).drop_duplicates(subset='Customer')
print(df)

輸出：

    Customer    Email          Email2
0      1    jdoe@mail.com   john_d@mail.com
1      2    jane1@mail.com      NaN
2      3    adam@mail.com       NaN

Answer 2

您可以使用cumcount來創建 MultiIndex。 然后重塑通過使用數據unstack通過，並添加改變列名add_prefix ：

    df = (df.set_index(['Customer',df.groupby('Customer').cumcount()])['Email']
        .unstack()
        .add_prefix('Email')
        .reset_index())
    print(df)

你會得到你想要的。

使用熊貓重新排列表格

問題描述

2 個解決方案

解決方案1
1 2021-07-20 23:22:39

解決方案2
0 2021-07-20 22:54:14

使用熊貓重新排列表格

問題描述

2 個解決方案

解決方案1 1 2021-07-20 23:22:39

解決方案2 0 2021-07-20 22:54:14

解決方案1
1 2021-07-20 23:22:39

解決方案2
0 2021-07-20 22:54:14