简体   繁体   中英

Pandas Comparing Two Data Frames

I have two dataframes. I will explain my requirement in form of a loop--because this is how I visualize the problem. I realize that there can be another solution, so if this can be done differently, please feel free to share! I am new to Pandas, so I'm struggling with this solution. Thank you in advance for looking at my question!!

I have 2 dataframes that have 3 columns: ID, ODO, ODOLength. ODOLength is the running difference for each ODO record, which I got using: abs(Df1['Odo'] - Df1['Odo'].shift(-1))

OldDataSet = {'id' : [10,20,30,40,50,60,70,80,90,100,110,120,130,140],'Odo': [-1.09,1.02,26.12,43.12,46.81,56.23,111.07,166.38,191.27,196.41,207.74,231.61,235.84,240.04], 'OdoLength':[2.11,25.1,17,3.69,9.42,54.84,55.31,24.89,5.14,11.33,23.87,4.23,4.2,4.09]}

NewDataSet = {'id' : [1000,2000,3000,4000,5000,6000,7000,8000,9000,10000,11000,12000,13000,14000],'Odo': [1.51,2.68,4.72,25.03,42,45.74,55.15,110.05,165.41,170.48,172.39,190.35,195.44,206.78], 'OdoLength':[1.17,2.04,20.31,16.97,3.74,9.41,54.9,55.36,5.07,1.91,17.96,5.09,11.34,23.89]}

FinalResultDataSet = {'DFOneId':[10,20,30,40,50,60,70,80,90,100,110], 'DFTwoID' : [1000,3000,4000,5000,6000,7000,8000,11000,12000,13000,14000], 'OdoDiff': [2.6,3.7,1.09,1.12,1.07,1.08,1.02,6.01,0.92,0.97,0.96], 'OdoLengthDiff':[0.94,4.79,0.03,0.05,0.01,0.06,0.05,6.93,0.05,0.01,0.02], 'OdoAndLengthDiff':[1.66,1.09,1.06,1.07,1.06,1.02,0.97,0.92,0.87,0.96,0.94]}


df1= pd.DataFrame(OldDataSet)

df2 = pd.DataFrame(NewDataSet)

FinalDf = pd.DataFrame(FinalResultDataSet)

The logic behind how to get the FinalDF is as follows: Take Odo and OdoLen from df1 and subtract it from each Odo and OdoLen columns in df2. Take the lowest value of the difference and match them. For next comparison of Df1 and Df2, begin with the first Df2 record that does not have a match. If Df2 values are not a minimum value, for the current Df1 values that is being compared then that record of DF2 is not included in the final dataset. For example, Df1 ID 20- was compared to Df2 ID 2000 and the final result was 21.4 ((DfOne.ODO:1.02-DfTwo.ODO:2.68) - (DfOneODOLen:25.1-DfTwo.ODoLen-2.04) = 21.4), however when Df1 ID 20 is compared to Df2 3000 the final difference is 1.09 ((DfOne.ODO:1.02-DfTwo.ODO:4.72) - (DfOneODOLen:25.1-DfTwo.ODoLen-20.31) = 1.06). In this case, Df2 ID 3000 is matched to DF1 ID 20 and Df2 ID - 2000 is dropped off because the difference was larger. At this point DF2 ID 2000 is not considered for any other matches. So the next DF1 record comparison would start at DF2 ID 4000, because that is the next value that does not have a match.

As I said, I am open to all suggestions!

Thanks!

You can using merge_asof

Step 1: combine the dataframe

df1['match']=df1.Odo+df1.OdoLength
df2['match']=df2.Odo+df2.OdoLength

out=pd.merge_asof(df1,df2,on='match',direction='nearest')
out.drop_duplicates(['id_y'])
Out[728]:
     Odo_x  OdoLength_x  id_x   match   Odo_y  OdoLength_y   id_y
0    -1.09         2.11    10    1.02    1.51         1.17   1000
1     1.02        25.10    20   26.12    4.72        20.31   3000
2    26.12        17.00    30   43.12   25.03        16.97   4000
3    43.12         3.69    40   46.81   42.00         3.74   5000
4    46.81         9.42    50   56.23   45.74         9.41   6000
5    56.23        54.84    60  111.07   55.15        54.90   7000
6   111.07        55.31    70  166.38  110.05        55.36   8000
7   166.38        24.89    80  191.27  172.39        17.96  11000
8   191.27         5.14    90  196.41  190.35         5.09  12000
9   196.41        11.33   100  207.74  195.44        11.34  13000
10  207.74        23.87   110  231.61  206.78        23.89  14000

Step 2

Then you can do something like below to get your new column

out['OdoAndLengthDiff']=out.OdoLength_x-out.OdoLength_y+out.Odo_x-out.Odo_y 

BTW I did not drop the column , after you get all new value if you need, You can drop it by using out=out.drop([columns],1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM