简体   繁体   中英

pandas two dataframes multiple column values comparison

I have a few very large datasets with x, y and z values. These datasets represent interpolated height measurements in time. The first dataset (the original) contains the data for the entire area. In time parts of the area have been measured again. I want to overwrite the original dataset in the location where x and y are equal but z is different (the height has changed at location(x,y)).

So my dataframes look something like this

Original:

x    y    z
1    1    0.5
1    2    0.5
1    3    0.5
2    1    0.5
2    2    0.5
2    3    0.5
3    1    0.5
3    2    0.5
3    3    0.5

New measurement:

x    y    z
0    1    0.5
0    2    0.5
1    1    1.5
1    2    0.5
2    1    0.5
2    2    1.0

The final dataframe should look like this:

x    y    z
1    1    1.5
1    2    0.5
1    3    0.5
2    1    0.5
2    2    1.0
2    3    0.5
3    1    0.5
3    2    0.5
3    3    0.5

I can loop through all the measurements and see of the x and y occur in the original and if the z value is different (if so, replace it) but this takes forever and I can imagine that there must be a better way using pandas. How would I do this in a fast and efficient way?

Given that 'Original' is df1 and 'New Measurement' is df2 :

df3 = df1.set_index(['x', 'y'])
df3.update(df2.set_index(['x', 'y']))  # Inplace modificatioin on df3.
>>> df3.reset_index()
   x  y    z
0  1  1  1.5
1  1  2  0.5
2  1  3  0.5
3  2  1  0.5
4  2  2  1.0
5  2  3  0.5
6  3  1  0.5
7  3  2  0.5
8  3  3  0.5

You can use

merge on df1 and df2 with x, y keys
assign new column z with fillna using z_x, z_y
drop these unwanted columns

In [716]: (df1.merge(df2, on=['x', 'y'], how='left')
              .assign(z=lambda x: x.z_y.fillna(x.z_x))
              .drop(['z_x', 'z_y'], 1))
Out[716]:
   x  y    z
0  1  1  1.5
1  1  2  0.5
2  1  3  0.5
3  2  1  0.5
4  2  2  1.0
5  2  3  0.5
6  3  1  0.5
7  3  2  0.5
8  3  3  0.5

Details

In [717]: df1.merge(df2, on=['x', 'y'], how='left')
Out[717]:
   x  y  z_x  z_y
0  1  1  0.5  1.5
1  1  2  0.5  0.5
2  1  3  0.5  NaN
3  2  1  0.5  0.5
4  2  2  0.5  1.0
5  2  3  0.5  NaN
6  3  1  0.5  NaN
7  3  2  0.5  NaN
8  3  3  0.5  NaN
original[(original.x == new.x) | (original.y == new.y)].z = new.z

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM