简体   繁体   English

熊猫从一列开始合并,完全在其他列上?

[英]pandas merge as of on one column, exactly on other columns?

I am trying to merge 2 dataframes, with exact matching on some columns and as_of matching on some other column (typically a date).我正在尝试合并 2 个数据as_of在某些列上精确匹配,在其他列(通常是日期)上使用as_of匹配。 The intention is very well described in this post (I'll copy paste the main content below):这篇文章很好地描述了这个意图(我将复制粘贴下面的主要内容):

Pandas: Approximate join on one column, exact match on other columns Pandas:在一列上近似连接,在其他列上完全匹配

The post above was answered ;上面的帖子已经回答了; only it dates back from 2016, before the introduction of pandas.merge_asof .只有它可以追溯到 2016 年,在引入pandas.merge_asof之前。 I believe there can be an easier answer now that it's been released.我相信现在已经发布了一个更简单的答案。 Brutal approach would be to merge as_of for each group of rows with the same values for the cols on which I want to merge exactly on.残酷的方法是将每组行的 as_of 合并为我想要完全合并的列的相同值。 But is there a more elegant version?但是有更优雅的版本吗?

Precise description of desired input and outputs:所需输入和输出的精确描述:

Inputs输入

df1 = pd.DataFrame({'index': ['a1','a2','a3','a4'], 'col1': ['1232','432','432','123'], 'col2': ['asd','dsa12','dsa12','asd2'], 'col3': ['1','2','2','3'], 'date': ['2010-01-23','2016-05-20','2010-06-20','2008-10-21'],}).set_index('index')

df1
Out[430]: 
       col1   col2 col3        date
index                              
a1     1232    asd    1  2010-01-23
a2      432  dsa12    2  2016-05-20
a3      432  dsa12    2  2010-06-20
a4      123   asd2    3  2008-10-21

df2 = pd.DataFrame({'index': ['b1','b2','b3','b4'], 'col1': ['132','432','432','123'], 'col2': ['asd','dsa12','dsa12','sd2'], 'col3': ['1','2','2','3'], 'date': ['2010-01-23','2016-05-23','2010-06-10','2008-10-21'],}).set_index('index')

df2
Out[434]: 
      col1   col2 col3        date    b_col
index                             
b1     132    asd    1  2010-01-23        1
b2     432  dsa12    2  2016-05-23        2
b3     432  dsa12    2  2010-06-10        3
b4     123    sd2    3  2008-10-21        4

Outputs:输出:

       col1   col2 col3        date b_col
index                                                     
a2      432  dsa12    2  2016-05-20     2
a3      432  dsa12    2  2010-06-20     3

NOTE 1: the reason why I need to do this is that I need something like groupby(...)[...].rolling(...).transform(...) with latency which doesn't seem to exist yet, unless I am missing something?注意 1:我需要这样做的原因是我需要像groupby(...)[...].rolling(...).transform(...) ,但似乎没有延迟是否存在,除非我遗漏了什么?

NOTE 2: I want to avoid computing all couples and then filtering as the dataframe may get too big.注意 2:我想避免计算所有对,然后过滤,因为数据框可能会变得太大。

I have tried to get closer to your problem.我试图更接近你的问题。 However, I did not try merge_asof but merge.但是,我没有尝试 merge_asof 而是合并。 I hope this approach can help you:我希望这种方法可以帮助您:

import numpy as np
import pandas as pd


df1 = pd.DataFrame({'index': ['a1', 'a2', 'a3', 'a4'], 'col1': ['1232', '432', '432', '123'],
                'col2': ['asd', 'dsa12', 'dsa12', 'asd2'], 'col3': ['1', '2', '2', '3'],
                'date': ['2010-01-23', '2016-05-20', '2010-06-20', '2008-10-21'],
                }).set_index('index')

df2 = pd.DataFrame({'index': ['b1', 'b2', 'b3', 'b4'], 'col1': ['132', '432', '432', '123'],
                'col2': ['asd', 'dsa12', 'dsa12', 'sd2'], 'col3': ['1', '2', '2', '3'],
                'date': ['2010-01-23', '2016-05-23', '2010-06-10', '2008-10-21'],
                }).set_index('index')


columns = ['col1', 'col2', 'col3']


                                                                                                                                 
new_dic = pd.merge(df1, df2, on=columns, right_index=True).drop_duplicates(subset=['date_x']).drop(labels='date_y', axis=1)          
                                                                                                                             
                                                                                                                  

print(new_dic)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 pandas DataFrame:规范化一个JSON列并与其他列合并 - pandas DataFrame: normalize one JSON column and merge with other columns 通过根据列值熊猫数据框将一列置于另一列之下,将多列合并为一列 - Merge multiple columns into one by placing one below the other based on column value pandas dataframe 如何将 pandas 中的两列合并为一列? - How merge two columns into one column in pandas? Pandas 一列的平均值,按其他列的值 - Pandas mean of one column, by value of other columns Python Pandas:将具有列名的数据框列合并为一列 - Python Pandas: Merge Columns of Data Frame with column name into one column 仅当特定列至少包含另一列的一个单词时,才从 Dataframe2 合并 Dataframe1 的 Python/Pandas 中的列 - Merge columns in Python/Pandas of Dataframe1 from Dataframe2 only if specific column contains at least one of the words of the other column 基于 Pandas 中的一列将数据框特定列合并在一起 - Merge dataframes specific columns together based on one column in Pandas 将列合并为一列 - Merge columns into one column 如何在同一列上合并数据框,同时对齐其他列 - How to merge dataframes on one column while aligning the other columns in common 基于一列合并 dataframe 并对其他列求和 - Python - merge a dataframe based on one column and summing the other columns - Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM