在熊猫中做到这一点的最有效方法是什么

Question

I'm trying to figure out the most efficient way to join two dataframes such as below.我试图找出连接两个数据框的最有效方法，如下所示。

I've tried pd.merge and maybe using the rank function but cannot seem to figure a way.我已经尝试过 pd.merge 并且可能使用了 rank 函数，但似乎无法找到方法。

Thanks in advance提前致谢

df1 df1

| A        | B              | C          |
| -------- | -------------- |------------|
| TBK1     | 2022-01-01     |2022-04-04  |
| TBK1     | 2022-02-02     |2021-01-09  | 
| TBK3     | 2022-05-07     |2023-02-04  |

What I'm trying to achieve is this我想要实现的是这个

df2 df2

| A        | B              | C          | D              | E          |
| -------- | -------------- |------------|----------------|------------|
| TBK1     | 2022-01-01     |2022-04-04  | 2022-02-02     |2021-01-09  | 
| TBK3     | 2022-05-07     |2023-02-04  |NaN             |NaN         |

Answer 1

You might want to use groupby with unstack as advised in this answer :您可能希望按照此答案中的建议将groupby与unstack一起使用：

import pandas as pd
from string import ascii_uppercase

# Reproduce the data
df = pd.DataFrame()
df['A'] = ['TBK1','TBK1', 'TBK3']
df['B'] = ['2022-01-01' , '2022-02-02', '2022-05-07']
df['C'] = ['2022-04-04', '2021-01-09', '2023-02-04']

# Count how many rows exists per unique entry
s = df.groupby(['A']).cumcount() 
# Unstack
df1 = df.set_index(['A', s]).unstack().sort_index(level=1, axis=1)
# Rename columns
df1.columns = [l for l in ascii_uppercase[1:len(df1.columns)+1]]
# Flatten columns names (aesthetics)
df1 = df1.reset_index()

print(df1)

      A           B           C           D           E
0  TBK1  2022-01-01  2022-04-04  2022-02-02  2021-01-09
1  TBK3  2022-05-07  2023-02-04         NaN         NaN

在熊猫中做到这一点的最有效方法是什么

问题描述

1 个解决方案

解决方案1
1 已采纳 2022-06-02 04:30:16

在熊猫中做到这一点的最有效方法是什么

问题描述

1 个解决方案

解决方案1 1 已采纳 2022-06-02 04:30:16

解决方案1
1 已采纳 2022-06-02 04:30:16