简体   繁体   English

在熊猫中做到这一点的最有效方法是什么

[英]what would be the most efficient way to do this in pandas

I'm trying to figure out the most efficient way to join two dataframes such as below.我试图找出连接两个数据框的最有效方法,如下所示。

I've tried pd.merge and maybe using the rank function but cannot seem to figure a way.我已经尝试过 pd.merge 并且可能使用了 rank 函数,但似乎无法找到方法。

Thanks in advance提前致谢

df1 df1

| A        | B              | C          |
| -------- | -------------- |------------|
| TBK1     | 2022-01-01     |2022-04-04  |
| TBK1     | 2022-02-02     |2021-01-09  | 
| TBK3     | 2022-05-07     |2023-02-04  |

What I'm trying to achieve is this我想要实现的是这个

df2 df2

| A        | B              | C          | D              | E          |
| -------- | -------------- |------------|----------------|------------|
| TBK1     | 2022-01-01     |2022-04-04  | 2022-02-02     |2021-01-09  | 
| TBK3     | 2022-05-07     |2023-02-04  |NaN             |NaN         |


You might want to use groupby with unstack as advised in this answer :您可能希望按照此答案中的建议将groupbyunstack一起使用:

import pandas as pd
from string import ascii_uppercase

# Reproduce the data
df = pd.DataFrame()
df['A'] = ['TBK1','TBK1', 'TBK3']
df['B'] = ['2022-01-01' , '2022-02-02', '2022-05-07']
df['C'] = ['2022-04-04', '2021-01-09', '2023-02-04']

# Count how many rows exists per unique entry
s = df.groupby(['A']).cumcount() 
# Unstack
df1 = df.set_index(['A', s]).unstack().sort_index(level=1, axis=1)
# Rename columns
df1.columns = [l for l in ascii_uppercase[1:len(df1.columns)+1]]
# Flatten columns names (aesthetics)
df1 = df1.reset_index()

print(df1)

      A           B           C           D           E
0  TBK1  2022-01-01  2022-04-04  2022-02-02  2021-01-09
1  TBK3  2022-05-07  2023-02-04         NaN         NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 进行搜索(mysql或文本)的最有效方法是什么? - What would be the most efficient way to do this search (mysql or text)? 计算熊猫中出现次数的最有效方法是什么? - What is the most efficient way of counting occurrences in pandas? 为 `cudf` 做 `diff` 的最有效方法是什么 - what is the most efficient way to do `diff` for a `cudf` 最有效的方法是在 Python 中使用海龟绘制像素? - What would the most efficient way be to draw a pixel using turtle in Python? 存储考勤数据的最有效或最有用的方法是什么? - What would be the most efficient or useful way to store attendance data? 建立属于某个类的类的最有效方法是什么 - What would be the most efficient way to set up classes that belong to a class 重新编写大熊猫列的最有效和pythonic方法是什么? - What is the most efficient & pythonic way to recode a pandas column? 用熊猫循环遍历数据帧的最有效方法是什么? - What is the most efficient way to loop through dataframes with pandas? 在 pandas 中计算一组特定关键字出现次数的最有效方法是什么? - What is the most efficient way of counting occurrences of a bunch of specific keywords in pandas? 在 pandas dataframe 中计算不同值的最有效方法是什么? - What is the most efficient way to get count of distinct values in a pandas dataframe?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM