简体   繁体   中英

what would be the most efficient way to do this in pandas

I'm trying to figure out the most efficient way to join two dataframes such as below.

I've tried pd.merge and maybe using the rank function but cannot seem to figure a way.

Thanks in advance

df1

| A        | B              | C          |
| -------- | -------------- |------------|
| TBK1     | 2022-01-01     |2022-04-04  |
| TBK1     | 2022-02-02     |2021-01-09  | 
| TBK3     | 2022-05-07     |2023-02-04  |

What I'm trying to achieve is this

df2

| A        | B              | C          | D              | E          |
| -------- | -------------- |------------|----------------|------------|
| TBK1     | 2022-01-01     |2022-04-04  | 2022-02-02     |2021-01-09  | 
| TBK3     | 2022-05-07     |2023-02-04  |NaN             |NaN         |


You might want to use groupby with unstack as advised in this answer :

import pandas as pd
from string import ascii_uppercase

# Reproduce the data
df = pd.DataFrame()
df['A'] = ['TBK1','TBK1', 'TBK3']
df['B'] = ['2022-01-01' , '2022-02-02', '2022-05-07']
df['C'] = ['2022-04-04', '2021-01-09', '2023-02-04']

# Count how many rows exists per unique entry
s = df.groupby(['A']).cumcount() 
# Unstack
df1 = df.set_index(['A', s]).unstack().sort_index(level=1, axis=1)
# Rename columns
df1.columns = [l for l in ascii_uppercase[1:len(df1.columns)+1]]
# Flatten columns names (aesthetics)
df1 = df1.reset_index()

print(df1)

      A           B           C           D           E
0  TBK1  2022-01-01  2022-04-04  2022-02-02  2021-01-09
1  TBK3  2022-05-07  2023-02-04         NaN         NaN

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM