[英]what would be the most efficient way to do this in pandas
I'm trying to figure out the most efficient way to join two dataframes such as below.我试图找出连接两个数据框的最有效方法,如下所示。
I've tried pd.merge and maybe using the rank function but cannot seem to figure a way.我已经尝试过 pd.merge 并且可能使用了 rank 函数,但似乎无法找到方法。
Thanks in advance提前致谢
df1 df1
| A | B | C |
| -------- | -------------- |------------|
| TBK1 | 2022-01-01 |2022-04-04 |
| TBK1 | 2022-02-02 |2021-01-09 |
| TBK3 | 2022-05-07 |2023-02-04 |
What I'm trying to achieve is this我想要实现的是这个
df2 df2
| A | B | C | D | E |
| -------- | -------------- |------------|----------------|------------|
| TBK1 | 2022-01-01 |2022-04-04 | 2022-02-02 |2021-01-09 |
| TBK3 | 2022-05-07 |2023-02-04 |NaN |NaN |
You might want to use groupby
with unstack
as advised in this answer :您可能希望按照此答案中的建议将
groupby
与unstack
一起使用:
import pandas as pd
from string import ascii_uppercase
# Reproduce the data
df = pd.DataFrame()
df['A'] = ['TBK1','TBK1', 'TBK3']
df['B'] = ['2022-01-01' , '2022-02-02', '2022-05-07']
df['C'] = ['2022-04-04', '2021-01-09', '2023-02-04']
# Count how many rows exists per unique entry
s = df.groupby(['A']).cumcount()
# Unstack
df1 = df.set_index(['A', s]).unstack().sort_index(level=1, axis=1)
# Rename columns
df1.columns = [l for l in ascii_uppercase[1:len(df1.columns)+1]]
# Flatten columns names (aesthetics)
df1 = df1.reset_index()
print(df1)
A B C D E
0 TBK1 2022-01-01 2022-04-04 2022-02-02 2021-01-09
1 TBK3 2022-05-07 2023-02-04 NaN NaN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.