简体   繁体   English

从两个具有条件的独立数据框中将列添加到 pandas dataframe

[英]Add columns to pandas dataframe from two separate dataframes with condition

I'll admit that this question is quite specific.我承认这个问题很具体。 I'm trying to write a function that reads two time columns (same label) in separate dataframes df1['gps'] and df2['gps'] .我正在尝试编写一个 function,它在单独的数据帧df1['gps']df2['gps']中读取两个时间列(相同标签)。 I want to look for elements in the first column which are close to those in the second column, not necessarily in same row.我想在第一列中查找与第二列中的元素接近的元素,不一定在同一行中。 When the condition on time distance is met, I want to save the close elements in df1['gps'] and df1['gps'] in a new dataframe called coinc in separate columns coinc['gps1'] and coinc['gps2'] in the fastest and most efficient way.当满足时间距离条件时,我想将df1['gps']df1['gps']中的关闭元素保存在一个名为coinc的新 dataframe 中,单独的列coinc['gps1']coinc['gps2']以最快和最有效的方式。 This is my code:这是我的代码:

def find_coinc(df1, df2=None, tdelta=.25, shift=0):
    index_boolean = False
    if df2 is None:
        df2 = df1.copy()

    coincs = pd.DataFrame()
    for _, r1 in tqdm(df1.iterrows(), total=len(df1)):
        ctrig = df2.loc[abs(r1.gps+shift-df2.gps)<tdelta]
        print(r1.gps)
        coincs_single = pd.DataFrame()
        if len(ctrig)>0:
            coincs_single['gps1'] = r1.gps
            coincs_single['gps2'] = ctrig.gps                               
            coincs = pd.concat((coincs, coincs_single), axis = 0, ignore_index=index_boolean)                
            index_boolean=True
        else:
            pass
    return coincs 

The script runs fine, but when investigating the output, I find that one column of coinc is all NaN and I don't understand why.脚本运行正常,但是在调查output时,发现coinc的一列全是NaN,不明白为什么。 Test case with generated data:带有生成数据的测试用例:

a = pd.DataFrame()    #define dataframes and fill them
b = pd.DataFrame()
a['gps'] = [0.12, 0.13, 0.6, 0.7]
b['gps'] = [0.1, 0.3, 0.5, 0.81, 0.82, 0.83]

find_coinc(a, b, 0.16, 0)

The output yielded is:产生的 output 是:

    gps1 gps2
0   NaN 0.10
1   NaN 0.10
2   NaN 0.50
3   NaN 0.81
4   NaN 0.82
5   NaN 0.83

How can I write coinc so that both columns turn out fine?我如何编写coinc以使两列结果都很好?

Well, here is another solution.好吧,这是另一个解决方案。 Instead of concat two dataframes just add new rows to 'coincs' DataFrame. I will show you below.而不是连接两个数据帧,只需将新行添加到“coincs”DataFrame。我将在下面向您展示。

def find_coinc(df1, df2=None, tdelta=.25, shift=0):
    if df2 is None:
        df2 = df1.copy()

    coincs = pd.DataFrame(columns=['gps1', 'gps2'])
    for _, r1 in tqdm(df1.iterrows(), total=len(df1)):
        ctrig = df2.loc[abs(r1.gps+shift-df2.gps) < tdelta]
        if len(ctrig)>0:
            for ctrig_value in ctrig['gps']:
                # Add n rows based on 'ctrig' length.
                coincs.loc[len(coincs)] = [r1.gps, ctrig_value]
        else:
            pass
    return coincs

# -------------------

a = pd.DataFrame()  # define dataframes and fill them
b = pd.DataFrame()
a['gps'] = [0.12, 0.13, 0.6, 0.7]
b['gps'] = [0.1, 0.3, 0.5, 0.81, 0.82, 0.83]

coins = find_coinc(a, b, 0.16, 0)

print('\n\n')
print(coins.to_string())

Result:结果:

   gps1  gps2
0  0.12  0.10
1  0.13  0.10
2  0.60  0.50
3  0.70  0.81
4  0.70  0.82
5  0.70  0.83

I hope I could help: :D我希望我能帮上忙::D

So the issue is that there are multiple elements in df2['gps'] which satisfy the condition of being within a time window of df1['gps'] .所以问题是df2['gps']中有多个元素满足df1['gps']时间 window 内的条件。 I think I found a solution, but looking for a better one if possible.我想我找到了解决方案,但如果可能的话寻找更好的解决方案。 Highlighting the modified line in the original function as ### FIX UPDATE comment:将原始 function 中的修改行突出显示为 ### FIX UPDATE 注释:

def find_coinc(df1, df2=None, tdelta=.25, shift=0):
    index_boolean = False
    if df2 is None:
        df2 = df1.copy()

    coincs = pd.DataFrame()
    for _, r1 in tqdm(df1.iterrows(), total=len(df1)):
        ctrig = df2.loc[abs(r1.gps+shift-df2.gps)<tdelta]
        ctrig.reset_index(drop=True, inplace=True)
        coincs_single = pd.DataFrame()
        if len(ctrig)>0:
            coincs_single['gps1'] = [r1.gps]*len(ctrig)  ### FIX UPDATE
            coincs_single['gps2'] = ctrig.gps 
            print(ctrig.gps)
            coincs = pd.concat((coincs, coincs_single), axis = 0, ignore_index=index_boolean)
            index_boolean=True
        else:
            pass
    return coincs 

The solution I chose, since I want to have all the instances of the condition being met, was to write the same element in df1['gps'] into coinc['gps1'] the needed amount of times.我选择的解决方案是将df1['gps']中的相同元素写入coinc['gps1']所需的次数,因为我希望满足条件的所有实例。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM