简体   繁体   中英

How to merge/join/append two spatial overlapping datasets (From/To)?

I have two datasets that have an ID that overlaps. To make this shorter, I'm only going to post the ID that has an overlap. For the From/To interval that overlaps, I want to choose the second dataset, df2, except with Python I don't know how to do it. I know its probably easier/easiest with SQL but I want to know if it is possible with Python. There are extra variables in df2 that I want to come along for the ride but for the variables that are the same, I want to choose df2 instead of df1 for the From/To overlap between the two.

df1

ID From To Q RM RQ
MRC-17 447 472 0.63 42 10
MRC-17 472 502 2.5 42 20
MRC-17 502 503.8 2.5 37 10
MRC-17 503.8 509.7 0.42 29 10
MRC-17 509.7 527 0.38 32 10
MRC-17 527 545 0.38 32 10
MRC-17 545 551 3.33 47 26.67
MRC-17 551 576 0.38 32 10
MRC-17 576 579.5 6.07 47 48.57
MRC-17 579.5 597 0.38 32 10
MRC-17 597 616 0.38 32 10
MRC-17 616 626 4.75 47 38
MRC-17 626 647 0.38 32 10
MRC-17 647 662 0.83 34 10
MRC-17 662 677 0.38 37 10

df2

ID From To H DP DR IV No RQ RM Q
MRC-17 499 504 1 U S D 7 50 32 2.08
MRC-17 504 510 2 P R D 7 25 32 0.78
MRC-17 510 545 0 P K F 9 5 18 0.02
MRC-17 545 565 0 P K F 8 60 28 0.33
MRC-17 565 575 0 P K F 9 5 18 0.02
MRC-17 575 581 1 P K F 7 70 34 0.49
MRC-17 581 600 0 P K F 8 20 23 0.11
MRC-17 600 612 0 P K F 9 5 18 0.02
MRC-17 612 634 1 P S C 7 70 38 2.92
MRC-17 634 647 0 P S F 9 5 22 0.04
MRC-17 647 662 2 P S B 7 55 39 4.58
MRC-17 662 677 0 P S F 9 15 22 0.13

Resulting in Final (-99 means missing for numeric, X for char):

ID From To H DP DR IV No RQ RM Q
MRC-17 447 472 -99 X X X -99 10 42 0.63
MRC-17 472 499 -99 X X X -99 20 42 2.50
MRC-17 499 504 1 U S D 7 50 32 2.08
MRC-17 504 510 2 P R D 7 25 32 0.78
MRC-17 510 545 0 P K F 9 5 18 0.02
MRC-17 545 565 0 P K F 8 60 28 0.33
MRC-17 565 575 0 P K F 9 5 18 0.02
MRC-17 575 581 1 P K F 7 70 34 0.49
MRC-17 581 600 0 P K F 8 20 23 0.11
MRC-17 600 612 0 P K F 9 5 18 0.02
MRC-17 612 634 1 P S C 7 70 38 2.92
MRC-17 634 647 0 P S F 9 5 22 0.04
MRC-17 647 662 2 P S B 7 55 39 4.58
MRC-17 662 677 0 P S F 9 15 22 0.13

Thanks in advance for all the help!

So far all I've done is load the data:

# Load libraries
import pandas as pd
import numpy as np
from scipy import stats

df1 = pd.read_csv('LOGGED_DATA.csv')
df2 = pd.read_csv('PHOTOLOGGED_DATA.csv')

But I'm having a hard time trying to figure out how to go about this. I looked at inner, outer, etc joins. But having the interval overlap, is throwing it off!

With the dataframes you provided:

df1 = pd.DataFrame({'ID': ['MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17'], 'From': [447.0, 472.0, 502.0, 503.8, 509.7, 527.0, 545.0, 551.0, 576.0, 579.5, 597.0, 616.0, 626.0, 647.0, 662.0], 'To': [472.0, 502.0, 503.8, 509.7, 527.0, 545.0, 551.0, 576.0, 579.5, 597.0, 616.0, 626.0, 647.0, 662.0, 677.0], 'Q': [0.63, 2.5, 2.5, 0.42, 0.38, 0.38, 3.33, 0.38, 6.07, 0.38, 0.38, 4.75, 0.38, 0.83, 0.38], 'RM': [42, 42, 37, 29, 32, 32, 47, 32, 47, 32, 32, 47, 32, 34, 37], 'RQ': [10.0, 20.0, 10.0, 10.0, 10.0, 10.0, 26.67, 10.0, 48.57, 10.0, 10.0, 38.0, 10.0, 10.0, 10.0]})

df2 = pd.DataFrame({'ID': ['MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17'], 'From': [499, 504, 510, 545, 565, 575, 581, 600, 612, 634, 647, 662], 'To': [504, 510, 545, 565, 575, 581, 600, 612, 634, 647, 662, 677], 'H': [1, 2, 0, 0, 0, 1, 0, 0, 1, 0, 2, 0], 'DP': ['U', 'P', 'P', 'P', 'P', 'P', 'P', 'P', 'P', 'P', 'P', 'P'], 'DR': ['S', 'R', 'K', 'K', 'K', 'K', 'K', 'K', 'S', 'S', 'S', 'S'], 'IV': ['D', 'D', 'F', 'F', 'F', 'F', 'F', 'F', 'C', 'F', 'B', 'F'], 'No': [7, 7, 9, 8, 9, 7, 8, 9, 7, 9, 7, 9], 'RQ': [50, 25, 5, 60, 5, 70, 20, 5, 70, 5, 55, 15], 'RM': [32, 32, 18, 28, 18, 34, 23, 18, 38, 22, 39, 22], 'Q': [2.08, 0.78, 0.02, 0.33, 0.02, 0.49, 0.11, 0.02, 2.92, 0.04, 4.58, 0.13]})

Here is one way to do it:

# Select non entirely overlapping rows from df1
mask = (df1["From"] >= df2["From"].min()) & (df1["From"] <= df2["From"].max()) | (
    df1["To"] >= df2["To"].min()
) & (df1["To"] <= df2["To"].max())
df1 = df1[~mask]

# Fix end value
df1.loc[df1.shape[0] - 1, "To"] = df2["From"].min()

# Make new dataframe from sliced df1 and df2, do some cleanup
new_df = (
    pd.concat([df1, df2])
    .fillna(value={"H": -99, "No": -99, "DP": "X", "DR": "X", "IV": "X"})
    .reindex(
        ["ID", "From", "To", "H", "DP", "DR", "IV", "No", "RQ", "RM", "Q"],
        axis="columns",
    )
    .astype(
        {"From": "int32", "To": "int32", "H": "int32", "No": "int32", "RQ": "int32"}
    )
)

And so:

        ID  From   To   H DP DR IV  No  RQ  RM     Q
0   MRC-17   447  472 -99  X  X  X -99  10  42  0.63
1   MRC-17   472  499 -99  X  X  X -99  20  42  2.50
0   MRC-17   499  504   1  U  S  D   7  50  32  2.08
1   MRC-17   504  510   2  P  R  D   7  25  32  0.78
2   MRC-17   510  545   0  P  K  F   9   5  18  0.02
3   MRC-17   545  565   0  P  K  F   8  60  28  0.33
4   MRC-17   565  575   0  P  K  F   9   5  18  0.02
5   MRC-17   575  581   1  P  K  F   7  70  34  0.49
6   MRC-17   581  600   0  P  K  F   8  20  23  0.11
7   MRC-17   600  612   0  P  K  F   9   5  18  0.02
8   MRC-17   612  634   1  P  S  C   7  70  38  2.92
9   MRC-17   634  647   0  P  S  F   9   5  22  0.04
10  MRC-17   647  662   2  P  S  B   7  55  39  4.58
11  MRC-17   662  677   0  P  S  F   9  15  22  0.13

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM