I have two dataframes, one with reference data and one with "experimental" data. I want to compute the error associated with the experimental values by subtracting the reference values. However, the experimental DataFrame is in long form, and contains several variables for the same index. I want only to match the indices, such that sometimes the same reference value is used in the subtraction. The index in both dataframes is "Reaction".
Specifically I would like to create two new columns in the experimental dataframe called "BSE" and "BSE_CP". These should be computed as shown by the following pseudo-code
experimental['BSE'] = experimental['Delta_E'] - reference['Delta_E']
experimental['BSE_CP'] = experimental['Delta_E_CP'] - reference['Delta_E']
Naturally I have tried the above code, but it returns a ValueError:
ValueError: cannot reindex from a duplicate axis
I can do some manual labor, and loop over the various basis sets and compute the errors, then store these in temporary list, and finally assign the concatenated dataframe as a new variable. The below code works, but my (limited) pandas intuition tells me that there is a simpler way.
bses = []
bse_cps = []
for basis in exp.Basis_set.unique():
exp_sub = exp.loc[exp.Basis_set == basis]
bse = exp_sub['Delta_E'] - ref['Delta_E']
bse_cp = exp_sub['Delta_E_CP'] - ref['Delta_E']
bses.append(bse)
bse_cps.append(bse_cp)
exp['BSE'] = pd.concat(bses, axis=0)
exp['BSE_CP'] = pd.concat(bse_cps, axis=0)
Sample from experimental dataframe:
Basis_set Functional Delta_E Delta_E_CP BSSE
Reaction
Cr-Alkene-1 pc-3 PBE -24.950271 -24.922770 0.027485
Cr-Alkene-2 pc-3 PBE -20.674572 -20.633017 0.041541
Cr-Alkene-3 pc-3 PBE -9.621059 -9.560187 0.060868
Cr-Alkene-4 pc-3 PBE -15.913920 -15.821342 0.092578
Cr-Alkene-5 pc-3 PBE -9.925094 -9.836789 0.088305
Cr-Alkene-6 pc-3 PBE -16.365306 -16.266877 0.098429
Cr-CO pc-3 PBE -43.738982 -43.698595 0.040412
Cr-H2 pc-3 PBE -19.050313 -19.054649 -0.004336
Cr-MeCN pc-3 PBE -29.415768 -29.384396 0.031375
Cr-MeOH pc-3 PBE -18.165318 -18.120964 0.044365
Cr-THF pc-3 PBE -19.518354 -19.486973 0.031375
Cr-Water pc-3 PBE -16.643343 -16.582746 0.060617
Fe-MeOH pc-3 PBE -14.514893 -14.432698 0.082196
Ni-Alkene-1 pc-3 PBE -16.365802 -16.323111 0.042671
Ni-Alkene-2 pc-3 PBE -12.029692 -11.976059 0.053652
Ni-Alkene-3 pc-3 PBE -6.764403 -6.670935 0.093468
Ni-Alkene-4 pc-3 PBE -9.027397 -8.934491 0.092907
Ni-Alkene-5 pc-3 PBE -6.373132 -6.259096 0.114035
Ni-Alkene-6 pc-3 PBE -9.282549 -9.182826 0.099723
Ni-CO pc-3 PBE -29.330640 -29.271458 0.059174
Ni-MeCN pc-3 PBE -16.075560 -16.034989 0.040600
Ni-MeOH pc-3 PBE -7.261460 -7.210546 0.050891
Ni-NHC-1 pc-3 PBE -36.680622 -36.615234 0.065388
Ni-NHC-2 pc-3 PBE -36.232223 -36.115631 0.116592
Ni-THF pc-3 PBE -8.198476 -8.157920 0.040537
Ni-Water pc-3 PBE -6.052186 -5.988283 0.063902
Cr-Alkene-1 6-311++G(2df,2pd) PBE -25.843776 -24.979298 0.864478
Cr-Alkene-2 6-311++G(2df,2pd) PBE -22.012592 -20.741707 1.270885
Cr-Alkene-3 6-311++G(2df,2pd) PBE -11.692782 -9.797260 1.895522
Cr-Alkene-4 6-311++G(2df,2pd) PBE -17.853916 -15.858710 1.995206
Cr-Alkene-5 6-311++G(2df,2pd) PBE -12.365642 -10.000622 2.365020
Cr-Alkene-6 6-311++G(2df,2pd) PBE -18.460674 -16.333490 2.127184
Cr-CO 6-311++G(2df,2pd) PBE -44.629594 -43.514245 1.115349
Cr-H2 6-311++G(2df,2pd) PBE -19.422439 -19.074368 0.348071
Cr-MeCN 6-311++G(2df,2pd) PBE -30.350801 -29.453226 0.897575
Cr-MeOH 6-311++G(2df,2pd) PBE -19.176455 -18.105223 1.071232
Cr-THF 6-311++G(2df,2pd) PBE -20.776291 -19.514903 1.261388
Cr-Water 6-311++G(2df,2pd) PBE -17.581627 -16.548968 1.032659
Fe-MeOH 6-311++G(2df,2pd) PBE -15.773194 -14.295214 1.477980
Ni-Alkene-1 6-311++G(2df,2pd) PBE -17.343889 -16.455705 0.888184
Ni-Alkene-2 6-311++G(2df,2pd) PBE -13.206122 -12.113601 1.092521
Ni-Alkene-3 6-311++G(2df,2pd) PBE -8.452805 -6.882294 1.570512
Ni-Alkene-4 6-311++G(2df,2pd) PBE -10.640900 -9.033991 1.606909
Ni-Alkene-5 6-311++G(2df,2pd) PBE -8.377379 -6.450635 1.926744
Ni-Alkene-6 6-311++G(2df,2pd) PBE -11.016182 -9.303782 1.712400
Ni-CO 6-311++G(2df,2pd) PBE -30.283896 -29.304677 0.979219
Ni-MeCN 6-311++G(2df,2pd) PBE -16.837946 -16.065847 0.772099
Ni-MeOH 6-311++G(2df,2pd) PBE -8.014220 -7.085813 0.928407
Ni-NHC-1 6-311++G(2df,2pd) PBE -38.170826 -36.886904 1.283922
Ni-NHC-2 6-311++G(2df,2pd) PBE -38.598700 -36.387356 2.211343
Ni-THF 6-311++G(2df,2pd) PBE -9.091911 -8.048848 1.043063
Ni-Water 6-311++G(2df,2pd) PBE -6.754093 -5.892542 0.861551
The reference dataframe:
Delta_E
Reaction
Cr-Alkene-1 -24.984980
Cr-Alkene-2 -20.698715
Cr-Alkene-3 -9.620706
Cr-Alkene-4 -15.898494
Cr-Alkene-5 -9.984087
Cr-Alkene-6 -16.350411
Cr-Water -16.612333
Cr-MeOH -18.159461
Cr-THF -19.541941
Cr-MeCN -29.429611
Cr-CO -43.758283
Cr-H2 -19.092310
Ni-Alkene-1 -16.326735
Ni-Alkene-2 -11.955749
Ni-Alkene-3 -6.644702
Ni-Alkene-4 -8.922958
Ni-Alkene-5 -6.323173
Ni-Alkene-6 -9.171335
Ni-Water -5.925627
Ni-MeOH -7.149769
Ni-THF -8.095105
Ni-MeCN -15.941426
Ni-CO -29.236219
Ni-NHC-1 -36.582247
Ni-NHC-2 -36.093587
Fe-MeOH -14.469599
Desired output
Basis_set Functional Delta_E Delta_E_CP BSSE BSE BSE_CP
Reaction
Cr-Alkene-1 6-31+G(d) PBE -28.366635 -26.271858 2.094777 -3.381654 -1.286877
Cr-Alkene-2 6-31+G(d) PBE -24.810519 -21.984532 2.825986 -4.111804 -1.285817
Cr-Alkene-3 6-31+G(d) PBE -14.328868 -10.097466 4.231402 -4.708163 -0.476760
Cr-Alkene-4 6-31+G(d) PBE -21.041370 -16.296561 4.744809 -5.142876 -0.398067
Cr-Alkene-5 6-31+G(d) PBE -15.232350 -9.631952 5.600398 -5.248263 0.352135
...
I would recommend that you merge the experimental data frame with the reference data frame on the Reaction Id.
import pandas as pd
import numpy as np
mergedData= pd.merge(ref,exp_sub, how='left' ,on='Reaction', suffixes=('_ref', '_exp'),indicator ='Exists')
since you have the column Delta_E with the same name in both data frames, you can specify a suffix for the column name on merge. Meaning the merged result will have two columns Delta_E_ref, and Delta_E_exp. Finally, the indicator Exists, will have the value of 'both' when the reaction id is in both data frames, this is where you want to substract:
mergedData['bse']=np.nan
mergedData['bse_cp']=np.nan
mergedData['bse'] = np.where(mergedData['Exists']=='both',mergedData['Delta_E_ref'] - mergedData['Delta_E_exp'] , np.nan)
mergedData['bse_cp '] = np.where(mergedData['Exists']=='both',mergedData['Delta_E_CP'] - mergedData['Delta_E_exp'] , np.nan)
mergedData.drop('Exists',axis=1, inplace=True) ## droping the Exists column
This is the link from the pandas library on the merge function if you want to know more: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.