简体   繁体   English

合并熊猫数据框:在左侧创建空列

[英]Merging pandas dataframes: empty columns created in left

I have several datasets, which I am trying to merge into one. 我有几个数据集,试图将它们合并为一个。 Below, I created fictive simpler smaller datasets to test the method and it worked perfectly fine. 下面,我创建了虚拟的更简单的较小数据集来测试该方法,并且效果很好。

examplelog = pd.DataFrame({'Depth':[10,20,30,40,50,60,70,80], 
                       'TVD':[10,19.9,28.8,37.7,46.6,55.5,64.4,73.3],
                       'T1':[11,11.3,11.5,12.,12.3,12.6,13.,13.8],
                       'T2':[11.3,11.5,11.8,12.2,12.4,12.7,13.1,14.1]})

log1 = pd.DataFrame({'Depth':[30,40,50,60],'T3':[12.1,12.6,13.7,14.]})


log2 = pd.DataFrame({'Depth':[20,30,40,50,60],'T4':[12.0,12.2,12.4,13.2,14.1]})

logs=[log1,log2]

result=examplelog.copy()

for i in logs:
    result=result.merge(i,how='left', on='Depth')
print result

The result is, as expected: 结果是,如预期的那样:

Depth    T1    T2   TVD    T3    T4
0     10  11.0  11.3  10.0   NaN   NaN
1     20  11.3  11.5  19.9   NaN  12.0
2     30  11.5  11.8  28.8  12.1  12.2
3     40  12.0  12.2  37.7  12.3  12.4
4     50  12.3  12.4  46.6  13.5  13.2
5     60  12.6  12.7  55.5  14.2  14.1
6     70  13.0  13.1  64.4   NaN   NaN
7     80  13.8  14.1  73.3   NaN   NaN

Happy with the result, I applied this method to my actual data, but for T3 and T4 in the resulting dataframes, I received just empty columns (all values were NaN). 对结果感到满意,我将此方法应用于实际数据,但是对于结果数据帧中的T3和T4,我仅收到空列(所有值均为NaN)。 I suspect that the problem is with floating numbers, because my datasets were created on different machines by different software and although the "Depth" has the precision of two decimal numbers in all of the files, I am afraid that it may not be 20.05 in both of them, but one might be 20.049999999999999 while in the other it might be 20.05000000000001. 我怀疑问题在于浮点数,因为我的数据集是通过不同的软件在不同的机器上创建的,尽管“深度”在所有文件中都具有两位十进制数的精度,但恐怕它可能不是20.05。两者都可以,但是一个可能是20.049999999999999,而另一个可能是20.05000000000001。 Then, the merge function will not work, as shown in the following example: 然后,合并功能将不起作用,如以下示例所示:

examplelog = pd.DataFrame({'Depth':[10,20,30,40,50,60,70,80], 
                           'TVD':[10,19.9,28.8,37.7,46.6,55.5,64.4,73.3],
                           'T1':[11,11.3,11.5,12.,12.3,12.6,13.,13.8],
                           'T2':[11.3,11.5,11.8,12.2,12.4,12.7,13.1,14.1]})

log1 = pd.DataFrame({'Depth':[30.05,40.05,50.05,60.05],'T3':[12.1,12.6,13.7,14.]})


log2 = pd.DataFrame({'Depth':[20.01,30.01,40.01,50.01,60.01],'T4':[12.0,12.2,12.4,13.2,14.1]})

logs=[log1,log2]

result=examplelog.copy()

for i in logs:
    result=result.merge(i,how='left', on='Depth')
print result

   Depth    T1    T2   TVD  T3  T4
0     10  11.0  11.3  10.0 NaN NaN
1     20  11.3  11.5  19.9 NaN NaN
2     30  11.5  11.8  28.8 NaN NaN
3     40  12.0  12.2  37.7 NaN NaN
4     50  12.3  12.4  46.6 NaN NaN
5     60  12.6  12.7  55.5 NaN NaN
6     70  13.0  13.1  64.4 NaN NaN
7     80  13.8  14.1  73.3 NaN NaN

Do you know how to fix this? 你知道如何解决这个问题吗? Thanks! 谢谢!

Round the Depth values to the appropriate precision: 将“ Depth值四舍五入到适当的精度:

for df in [examplelog, log1, log2]:
    df['Depth'] = df['Depth'].round(1)

import numpy as np
import pandas as pd

examplelog = pd.DataFrame({'Depth':[10,20,30,40,50,60,70,80], 
                           'TVD':[10,19.9,28.8,37.7,46.6,55.5,64.4,73.3],
                           'T1':[11,11.3,11.5,12.,12.3,12.6,13.,13.8],
                           'T2':[11.3,11.5,11.8,12.2,12.4,12.7,13.1,14.1]})

log1 = pd.DataFrame({'Depth':[30.05,40.05,50.05,60.05],'T3':[12.1,12.6,13.7,14.]})
log2 = pd.DataFrame({'Depth':[20.01,30.01,40.01,50.01,60.01],
                     'T4':[12.0,12.2,12.4,13.2,14.1]})

for df in [examplelog, log1, log2]:
    df['Depth'] = df['Depth'].round(1)

logs=[log1,log2]
result=examplelog.copy()
for i in logs:
    result=result.merge(i,how='left', on='Depth')
print(result)

yields 产量

   Depth    T1    T2   TVD    T3    T4
0     10  11.0  11.3  10.0   NaN   NaN
1     20  11.3  11.5  19.9   NaN  12.0
2     30  11.5  11.8  28.8  12.1  12.2
3     40  12.0  12.2  37.7  12.6  12.4
4     50  12.3  12.4  46.6  13.7  13.2
5     60  12.6  12.7  55.5  14.0  14.1
6     70  13.0  13.1  64.4   NaN   NaN
7     80  13.8  14.1  73.3   NaN   NaN

Per the comments, rounding does not appear to work for the OP on the actual data. 根据注释,四舍五入似乎不适用于实际数据上的OP。 To debug the problem, find some rows which should merge: 要调试该问题,请找到一些应合并的行:

subframes = []
for frame in [examplelog, log2]:
    mask = (frame['Depth'] < 20.051) & (frame['Depth'] >= 20.0)
    subframes.append(frame.loc[mask])

Then post 然后发布

for frame in subframes:
    print(frame.to_dict('list')) 
    print(frame.info())          # shows the dtypes of the columns

This might give us the info we need to reproduce the problem. 这可能会为我们提供重现问题所需的信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM