簡體   English   中英

合並熊貓數據框:在左側創建空列

[英]Merging pandas dataframes: empty columns created in left

我有幾個數據集,試圖將它們合並為一個。 下面,我創建了虛擬的更簡單的較小數據集來測試該方法,並且效果很好。

examplelog = pd.DataFrame({'Depth':[10,20,30,40,50,60,70,80], 
                       'TVD':[10,19.9,28.8,37.7,46.6,55.5,64.4,73.3],
                       'T1':[11,11.3,11.5,12.,12.3,12.6,13.,13.8],
                       'T2':[11.3,11.5,11.8,12.2,12.4,12.7,13.1,14.1]})

log1 = pd.DataFrame({'Depth':[30,40,50,60],'T3':[12.1,12.6,13.7,14.]})


log2 = pd.DataFrame({'Depth':[20,30,40,50,60],'T4':[12.0,12.2,12.4,13.2,14.1]})

logs=[log1,log2]

result=examplelog.copy()

for i in logs:
    result=result.merge(i,how='left', on='Depth')
print result

結果是,如預期的那樣:

Depth    T1    T2   TVD    T3    T4
0     10  11.0  11.3  10.0   NaN   NaN
1     20  11.3  11.5  19.9   NaN  12.0
2     30  11.5  11.8  28.8  12.1  12.2
3     40  12.0  12.2  37.7  12.3  12.4
4     50  12.3  12.4  46.6  13.5  13.2
5     60  12.6  12.7  55.5  14.2  14.1
6     70  13.0  13.1  64.4   NaN   NaN
7     80  13.8  14.1  73.3   NaN   NaN

對結果感到滿意,我將此方法應用於實際數據,但是對於結果數據幀中的T3和T4,我僅收到空列(所有值均為NaN)。 我懷疑問題在於浮點數,因為我的數據集是通過不同的軟件在不同的機器上創建的,盡管“深度”在所有文件中都具有兩位十進制數的精度,但恐怕它可能不是20.05。兩者都可以,但是一個可能是20.049999999999999,而另一個可能是20.05000000000001。 然后,合並功能將不起作用,如以下示例所示:

examplelog = pd.DataFrame({'Depth':[10,20,30,40,50,60,70,80], 
                           'TVD':[10,19.9,28.8,37.7,46.6,55.5,64.4,73.3],
                           'T1':[11,11.3,11.5,12.,12.3,12.6,13.,13.8],
                           'T2':[11.3,11.5,11.8,12.2,12.4,12.7,13.1,14.1]})

log1 = pd.DataFrame({'Depth':[30.05,40.05,50.05,60.05],'T3':[12.1,12.6,13.7,14.]})


log2 = pd.DataFrame({'Depth':[20.01,30.01,40.01,50.01,60.01],'T4':[12.0,12.2,12.4,13.2,14.1]})

logs=[log1,log2]

result=examplelog.copy()

for i in logs:
    result=result.merge(i,how='left', on='Depth')
print result

   Depth    T1    T2   TVD  T3  T4
0     10  11.0  11.3  10.0 NaN NaN
1     20  11.3  11.5  19.9 NaN NaN
2     30  11.5  11.8  28.8 NaN NaN
3     40  12.0  12.2  37.7 NaN NaN
4     50  12.3  12.4  46.6 NaN NaN
5     60  12.6  12.7  55.5 NaN NaN
6     70  13.0  13.1  64.4 NaN NaN
7     80  13.8  14.1  73.3 NaN NaN

你知道如何解決這個問題嗎? 謝謝!

將“ Depth值四舍五入到適當的精度:

for df in [examplelog, log1, log2]:
    df['Depth'] = df['Depth'].round(1)

import numpy as np
import pandas as pd

examplelog = pd.DataFrame({'Depth':[10,20,30,40,50,60,70,80], 
                           'TVD':[10,19.9,28.8,37.7,46.6,55.5,64.4,73.3],
                           'T1':[11,11.3,11.5,12.,12.3,12.6,13.,13.8],
                           'T2':[11.3,11.5,11.8,12.2,12.4,12.7,13.1,14.1]})

log1 = pd.DataFrame({'Depth':[30.05,40.05,50.05,60.05],'T3':[12.1,12.6,13.7,14.]})
log2 = pd.DataFrame({'Depth':[20.01,30.01,40.01,50.01,60.01],
                     'T4':[12.0,12.2,12.4,13.2,14.1]})

for df in [examplelog, log1, log2]:
    df['Depth'] = df['Depth'].round(1)

logs=[log1,log2]
result=examplelog.copy()
for i in logs:
    result=result.merge(i,how='left', on='Depth')
print(result)

產量

   Depth    T1    T2   TVD    T3    T4
0     10  11.0  11.3  10.0   NaN   NaN
1     20  11.3  11.5  19.9   NaN  12.0
2     30  11.5  11.8  28.8  12.1  12.2
3     40  12.0  12.2  37.7  12.6  12.4
4     50  12.3  12.4  46.6  13.7  13.2
5     60  12.6  12.7  55.5  14.0  14.1
6     70  13.0  13.1  64.4   NaN   NaN
7     80  13.8  14.1  73.3   NaN   NaN

根據注釋,四舍五入似乎不適用於實際數據上的OP。 要調試該問題,請找到一些應合並的行:

subframes = []
for frame in [examplelog, log2]:
    mask = (frame['Depth'] < 20.051) & (frame['Depth'] >= 20.0)
    subframes.append(frame.loc[mask])

然后發布

for frame in subframes:
    print(frame.to_dict('list')) 
    print(frame.info())          # shows the dtypes of the columns

這可能會為我們提供重現問題所需的信息。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM