简体   繁体   中英

Subtract column in DataFrame1 from matching indices in long-form DataFrame2

I have two dataframes, one with reference data and one with "experimental" data. I want to compute the error associated with the experimental values by subtracting the reference values. However, the experimental DataFrame is in long form, and contains several variables for the same index. I want only to match the indices, such that sometimes the same reference value is used in the subtraction. The index in both dataframes is "Reaction".

Specifically I would like to create two new columns in the experimental dataframe called "BSE" and "BSE_CP". These should be computed as shown by the following pseudo-code

experimental['BSE'] = experimental['Delta_E'] - reference['Delta_E']
experimental['BSE_CP'] = experimental['Delta_E_CP'] - reference['Delta_E']

Naturally I have tried the above code, but it returns a ValueError:

ValueError: cannot reindex from a duplicate axis

I can do some manual labor, and loop over the various basis sets and compute the errors, then store these in temporary list, and finally assign the concatenated dataframe as a new variable. The below code works, but my (limited) pandas intuition tells me that there is a simpler way.

bses = []
bse_cps = []
for basis in exp.Basis_set.unique():
    exp_sub = exp.loc[exp.Basis_set == basis]
    bse = exp_sub['Delta_E'] - ref['Delta_E']
    bse_cp = exp_sub['Delta_E_CP'] - ref['Delta_E']
    bses.append(bse)
    bse_cps.append(bse_cp)
    
exp['BSE'] = pd.concat(bses, axis=0)
exp['BSE_CP'] = pd.concat(bse_cps, axis=0)

Sample from experimental dataframe:

                         Basis_set Functional    Delta_E  Delta_E_CP       BSSE
Reaction                                                                       
Cr-Alkene-1                   pc-3        PBE -24.950271  -24.922770   0.027485
Cr-Alkene-2                   pc-3        PBE -20.674572  -20.633017   0.041541
Cr-Alkene-3                   pc-3        PBE  -9.621059   -9.560187   0.060868
Cr-Alkene-4                   pc-3        PBE -15.913920  -15.821342   0.092578
Cr-Alkene-5                   pc-3        PBE  -9.925094   -9.836789   0.088305
Cr-Alkene-6                   pc-3        PBE -16.365306  -16.266877   0.098429
Cr-CO                         pc-3        PBE -43.738982  -43.698595   0.040412
Cr-H2                         pc-3        PBE -19.050313  -19.054649  -0.004336
Cr-MeCN                       pc-3        PBE -29.415768  -29.384396   0.031375
Cr-MeOH                       pc-3        PBE -18.165318  -18.120964   0.044365
Cr-THF                        pc-3        PBE -19.518354  -19.486973   0.031375
Cr-Water                      pc-3        PBE -16.643343  -16.582746   0.060617
Fe-MeOH                       pc-3        PBE -14.514893  -14.432698   0.082196
Ni-Alkene-1                   pc-3        PBE -16.365802  -16.323111   0.042671
Ni-Alkene-2                   pc-3        PBE -12.029692  -11.976059   0.053652
Ni-Alkene-3                   pc-3        PBE  -6.764403   -6.670935   0.093468
Ni-Alkene-4                   pc-3        PBE  -9.027397   -8.934491   0.092907
Ni-Alkene-5                   pc-3        PBE  -6.373132   -6.259096   0.114035
Ni-Alkene-6                   pc-3        PBE  -9.282549   -9.182826   0.099723
Ni-CO                         pc-3        PBE -29.330640  -29.271458   0.059174
Ni-MeCN                       pc-3        PBE -16.075560  -16.034989   0.040600
Ni-MeOH                       pc-3        PBE  -7.261460   -7.210546   0.050891
Ni-NHC-1                      pc-3        PBE -36.680622  -36.615234   0.065388
Ni-NHC-2                      pc-3        PBE -36.232223  -36.115631   0.116592
Ni-THF                        pc-3        PBE  -8.198476   -8.157920   0.040537
Ni-Water                      pc-3        PBE  -6.052186   -5.988283   0.063902
Cr-Alkene-1      6-311++G(2df,2pd)        PBE -25.843776  -24.979298   0.864478
Cr-Alkene-2      6-311++G(2df,2pd)        PBE -22.012592  -20.741707   1.270885
Cr-Alkene-3      6-311++G(2df,2pd)        PBE -11.692782   -9.797260   1.895522
Cr-Alkene-4      6-311++G(2df,2pd)        PBE -17.853916  -15.858710   1.995206
Cr-Alkene-5      6-311++G(2df,2pd)        PBE -12.365642  -10.000622   2.365020
Cr-Alkene-6      6-311++G(2df,2pd)        PBE -18.460674  -16.333490   2.127184
Cr-CO            6-311++G(2df,2pd)        PBE -44.629594  -43.514245   1.115349
Cr-H2            6-311++G(2df,2pd)        PBE -19.422439  -19.074368   0.348071
Cr-MeCN          6-311++G(2df,2pd)        PBE -30.350801  -29.453226   0.897575
Cr-MeOH          6-311++G(2df,2pd)        PBE -19.176455  -18.105223   1.071232
Cr-THF           6-311++G(2df,2pd)        PBE -20.776291  -19.514903   1.261388
Cr-Water         6-311++G(2df,2pd)        PBE -17.581627  -16.548968   1.032659
Fe-MeOH          6-311++G(2df,2pd)        PBE -15.773194  -14.295214   1.477980
Ni-Alkene-1      6-311++G(2df,2pd)        PBE -17.343889  -16.455705   0.888184
Ni-Alkene-2      6-311++G(2df,2pd)        PBE -13.206122  -12.113601   1.092521
Ni-Alkene-3      6-311++G(2df,2pd)        PBE  -8.452805   -6.882294   1.570512
Ni-Alkene-4      6-311++G(2df,2pd)        PBE -10.640900   -9.033991   1.606909
Ni-Alkene-5      6-311++G(2df,2pd)        PBE  -8.377379   -6.450635   1.926744
Ni-Alkene-6      6-311++G(2df,2pd)        PBE -11.016182   -9.303782   1.712400
Ni-CO            6-311++G(2df,2pd)        PBE -30.283896  -29.304677   0.979219
Ni-MeCN          6-311++G(2df,2pd)        PBE -16.837946  -16.065847   0.772099
Ni-MeOH          6-311++G(2df,2pd)        PBE  -8.014220   -7.085813   0.928407
Ni-NHC-1         6-311++G(2df,2pd)        PBE -38.170826  -36.886904   1.283922
Ni-NHC-2         6-311++G(2df,2pd)        PBE -38.598700  -36.387356   2.211343
Ni-THF           6-311++G(2df,2pd)        PBE  -9.091911   -8.048848   1.043063
Ni-Water         6-311++G(2df,2pd)        PBE  -6.754093   -5.892542   0.861551

The reference dataframe:

               Delta_E
Reaction              
Cr-Alkene-1 -24.984980
Cr-Alkene-2 -20.698715
Cr-Alkene-3  -9.620706
Cr-Alkene-4 -15.898494
Cr-Alkene-5  -9.984087
Cr-Alkene-6 -16.350411
Cr-Water    -16.612333
Cr-MeOH     -18.159461
Cr-THF      -19.541941
Cr-MeCN     -29.429611
Cr-CO       -43.758283
Cr-H2       -19.092310
Ni-Alkene-1 -16.326735
Ni-Alkene-2 -11.955749
Ni-Alkene-3  -6.644702
Ni-Alkene-4  -8.922958
Ni-Alkene-5  -6.323173
Ni-Alkene-6  -9.171335
Ni-Water     -5.925627
Ni-MeOH      -7.149769
Ni-THF       -8.095105
Ni-MeCN     -15.941426
Ni-CO       -29.236219
Ni-NHC-1    -36.582247
Ni-NHC-2    -36.093587
Fe-MeOH     -14.469599

Desired output

             Basis_set Functional    Delta_E  Delta_E_CP      BSSE       BSE    BSE_CP
Reaction                                                                              
Cr-Alkene-1  6-31+G(d)        PBE -28.366635  -26.271858  2.094777 -3.381654 -1.286877
Cr-Alkene-2  6-31+G(d)        PBE -24.810519  -21.984532  2.825986 -4.111804 -1.285817
Cr-Alkene-3  6-31+G(d)        PBE -14.328868  -10.097466  4.231402 -4.708163 -0.476760
Cr-Alkene-4  6-31+G(d)        PBE -21.041370  -16.296561  4.744809 -5.142876 -0.398067
Cr-Alkene-5  6-31+G(d)        PBE -15.232350   -9.631952  5.600398 -5.248263  0.352135
...

I would recommend that you merge the experimental data frame with the reference data frame on the Reaction Id.

import pandas as pd 
import numpy as np

mergedData= pd.merge(ref,exp_sub, how='left' ,on='Reaction',  suffixes=('_ref', '_exp'),indicator ='Exists')

since you have the column Delta_E with the same name in both data frames, you can specify a suffix for the column name on merge. Meaning the merged result will have two columns Delta_E_ref, and Delta_E_exp. Finally, the indicator Exists, will have the value of 'both' when the reaction id is in both data frames, this is where you want to substract:

mergedData['bse']=np.nan
mergedData['bse_cp']=np.nan

mergedData['bse'] = np.where(mergedData['Exists']=='both',mergedData['Delta_E_ref'] - mergedData['Delta_E_exp'] , np.nan)
mergedData['bse_cp '] = np.where(mergedData['Exists']=='both',mergedData['Delta_E_CP'] - mergedData['Delta_E_exp'] , np.nan)

mergedData.drop('Exists',axis=1, inplace=True) ## droping the Exists column 

This is the link from the pandas library on the merge function if you want to know more: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM