How to merge/join/append two spatial overlapping datasets (From/To)?

Question

I have two datasets that have an ID that overlaps. To make this shorter, I'm only going to post the ID that has an overlap. For the From/To interval that overlaps, I want to choose the second dataset, df2, except with Python I don't know how to do it. I know its probably easier/easiest with SQL but I want to know if it is possible with Python. There are extra variables in df2 that I want to come along for the ride but for the variables that are the same, I want to choose df2 instead of df1 for the From/To overlap between the two.

df1

ID	From	To	Q	RM	RQ
MRC-17	447	472	0.63	42	10
MRC-17	472	502	2.5	42	20
MRC-17	502	503.8	2.5	37	10
MRC-17	503.8	509.7	0.42	29	10
MRC-17	509.7	527	0.38	32	10
MRC-17	527	545	0.38	32	10
MRC-17	545	551	3.33	47	26.67
MRC-17	551	576	0.38	32	10
MRC-17	576	579.5	6.07	47	48.57
MRC-17	579.5	597	0.38	32	10
MRC-17	597	616	0.38	32	10
MRC-17	616	626	4.75	47	38
MRC-17	626	647	0.38	32	10
MRC-17	647	662	0.83	34	10
MRC-17	662	677	0.38	37	10

df2

ID	From	To	H	DP	DR	IV	No	RQ	RM	Q
MRC-17	499	504	1	U	S	D	7	50	32	2.08
MRC-17	504	510	2	P	R	D	7	25	32	0.78
MRC-17	510	545	0	P	K	F	9	5	18	0.02
MRC-17	545	565	0	P	K	F	8	60	28	0.33
MRC-17	565	575	0	P	K	F	9	5	18	0.02
MRC-17	575	581	1	P	K	F	7	70	34	0.49
MRC-17	581	600	0	P	K	F	8	20	23	0.11
MRC-17	600	612	0	P	K	F	9	5	18	0.02
MRC-17	612	634	1	P	S	C	7	70	38	2.92
MRC-17	634	647	0	P	S	F	9	5	22	0.04
MRC-17	647	662	2	P	S	B	7	55	39	4.58
MRC-17	662	677	0	P	S	F	9	15	22	0.13

Resulting in Final (-99 means missing for numeric, X for char):

ID	From	To	H	DP	DR	IV	No	RQ	RM	Q
MRC-17	447	472	-99	X	X	X	-99	10	42	0.63
MRC-17	472	499	-99	X	X	X	-99	20	42	2.50
MRC-17	499	504	1	U	S	D	7	50	32	2.08
MRC-17	504	510	2	P	R	D	7	25	32	0.78
MRC-17	510	545	0	P	K	F	9	5	18	0.02
MRC-17	545	565	0	P	K	F	8	60	28	0.33
MRC-17	565	575	0	P	K	F	9	5	18	0.02
MRC-17	575	581	1	P	K	F	7	70	34	0.49
MRC-17	581	600	0	P	K	F	8	20	23	0.11
MRC-17	600	612	0	P	K	F	9	5	18	0.02
MRC-17	612	634	1	P	S	C	7	70	38	2.92
MRC-17	634	647	0	P	S	F	9	5	22	0.04
MRC-17	647	662	2	P	S	B	7	55	39	4.58
MRC-17	662	677	0	P	S	F	9	15	22	0.13

Thanks in advance for all the help!

So far all I've done is load the data:

# Load libraries
import pandas as pd
import numpy as np
from scipy import stats

df1 = pd.read_csv('LOGGED_DATA.csv')
df2 = pd.read_csv('PHOTOLOGGED_DATA.csv')

But I'm having a hard time trying to figure out how to go about this. I looked at inner, outer, etc joins. But having the interval overlap, is throwing it off!

Answer 1

With the dataframes you provided:

df1 = pd.DataFrame({'ID': ['MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17'], 'From': [447.0, 472.0, 502.0, 503.8, 509.7, 527.0, 545.0, 551.0, 576.0, 579.5, 597.0, 616.0, 626.0, 647.0, 662.0], 'To': [472.0, 502.0, 503.8, 509.7, 527.0, 545.0, 551.0, 576.0, 579.5, 597.0, 616.0, 626.0, 647.0, 662.0, 677.0], 'Q': [0.63, 2.5, 2.5, 0.42, 0.38, 0.38, 3.33, 0.38, 6.07, 0.38, 0.38, 4.75, 0.38, 0.83, 0.38], 'RM': [42, 42, 37, 29, 32, 32, 47, 32, 47, 32, 32, 47, 32, 34, 37], 'RQ': [10.0, 20.0, 10.0, 10.0, 10.0, 10.0, 26.67, 10.0, 48.57, 10.0, 10.0, 38.0, 10.0, 10.0, 10.0]})

df2 = pd.DataFrame({'ID': ['MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17', 'MRC-17'], 'From': [499, 504, 510, 545, 565, 575, 581, 600, 612, 634, 647, 662], 'To': [504, 510, 545, 565, 575, 581, 600, 612, 634, 647, 662, 677], 'H': [1, 2, 0, 0, 0, 1, 0, 0, 1, 0, 2, 0], 'DP': ['U', 'P', 'P', 'P', 'P', 'P', 'P', 'P', 'P', 'P', 'P', 'P'], 'DR': ['S', 'R', 'K', 'K', 'K', 'K', 'K', 'K', 'S', 'S', 'S', 'S'], 'IV': ['D', 'D', 'F', 'F', 'F', 'F', 'F', 'F', 'C', 'F', 'B', 'F'], 'No': [7, 7, 9, 8, 9, 7, 8, 9, 7, 9, 7, 9], 'RQ': [50, 25, 5, 60, 5, 70, 20, 5, 70, 5, 55, 15], 'RM': [32, 32, 18, 28, 18, 34, 23, 18, 38, 22, 39, 22], 'Q': [2.08, 0.78, 0.02, 0.33, 0.02, 0.49, 0.11, 0.02, 2.92, 0.04, 4.58, 0.13]})

Here is one way to do it:

# Select non entirely overlapping rows from df1
mask = (df1["From"] >= df2["From"].min()) & (df1["From"] <= df2["From"].max()) | (
    df1["To"] >= df2["To"].min()
) & (df1["To"] <= df2["To"].max())
df1 = df1[~mask]

# Fix end value
df1.loc[df1.shape[0] - 1, "To"] = df2["From"].min()

# Make new dataframe from sliced df1 and df2, do some cleanup
new_df = (
    pd.concat([df1, df2])
    .fillna(value={"H": -99, "No": -99, "DP": "X", "DR": "X", "IV": "X"})
    .reindex(
        ["ID", "From", "To", "H", "DP", "DR", "IV", "No", "RQ", "RM", "Q"],
        axis="columns",
    )
    .astype(
        {"From": "int32", "To": "int32", "H": "int32", "No": "int32", "RQ": "int32"}
    )
)

And so:

        ID  From   To   H DP DR IV  No  RQ  RM     Q
0   MRC-17   447  472 -99  X  X  X -99  10  42  0.63
1   MRC-17   472  499 -99  X  X  X -99  20  42  2.50
0   MRC-17   499  504   1  U  S  D   7  50  32  2.08
1   MRC-17   504  510   2  P  R  D   7  25  32  0.78
2   MRC-17   510  545   0  P  K  F   9   5  18  0.02
3   MRC-17   545  565   0  P  K  F   8  60  28  0.33
4   MRC-17   565  575   0  P  K  F   9   5  18  0.02
5   MRC-17   575  581   1  P  K  F   7  70  34  0.49
6   MRC-17   581  600   0  P  K  F   8  20  23  0.11
7   MRC-17   600  612   0  P  K  F   9   5  18  0.02
8   MRC-17   612  634   1  P  S  C   7  70  38  2.92
9   MRC-17   634  647   0  P  S  F   9   5  22  0.04
10  MRC-17   647  662   2  P  S  B   7  55  39  4.58
11  MRC-17   662  677   0  P  S  F   9  15  22  0.13

How to merge/join/append two spatial overlapping datasets (From/To)?

Question

1 answers

solution1
0 2022-07-23 17:25:29

How to merge/join/append two spatial overlapping datasets (From/To)?

Question

1 answers

solution1 0 2022-07-23 17:25:29

solution1
0 2022-07-23 17:25:29