简体   繁体   中英

find overlaps using pandas

file1 :

col0 col1 col2 col3
1    20   10   -
1    25   30   +
2    50   40   -
2    60   70   +

file2:

col0 col1 col2 col3 col4 col5
1    25   5   -     e1    e2
1    20   30   +    e4    e24
2    45   35   -    e9    e12
2    55   75   +    e10    e21

I need to check for 3 conditions : 1) check if element in col0 is same (1 ==1)

2) if yes then check the col3 if its - or +, if - then the col1 value will be higher than col2 and the opposite if its +,

3) check if the range is within the col1 and col2 from file2 ie in row two 25-30 fall within the range 20-30 if all conditions met, out put rows in file 1 and col4 and col5 in file 2 (which are some meta info)

col0 col1 col2 col3 col4 col5
1    20   10   -    e1    e2
1    25   30   +    e4    e24
2    50   40   -    e9    e12
2    60   70   +    e10   e21

simple python code would have a condition like this

for i in file1:
    for j in file2:
        if i[1]<j[1] and i[2]<j[2] and i[0]==j[0]:
           print j

not sure how to account for the sign, How can i achieve this using pandas, or pandas would not be a right approach?

Here's one approach:

Generate sample data provided by OP

# copy file1 data from OP, then run:
df1 = pd.read_clipboard().reset_index()

# copy file2 data from OP, then run:
df2 = pd.read_clipboard().reset_index()

Merge and filter data

# merge data on row index and col0 value
df3 = (df1.merge(df2.drop('col3',1), on=['index','col0'], suffixes=('_1','_2'))
          .drop('index',1))

# sort range cols from file2
df3[['col1_2','col2_2']] = df3[['col1_2','col2_2']].apply(sorted, axis='columns')

# filter based on file1 range within file2 range, drop extra cols
df3 = (df3.loc[df3.col1_1.between(df3.col1_2, df3.col2_2) & df3.col2_1.between(df3.col1_2, df3.col2_2)]
          .drop(['col1_2','col2_2'], 1))
print(df3)

Output

   col0  col1_1  col2_1 col3 col4 col5
0     1      20      10    -   e1   e2
1     1      25      30    +   e4  e24
3     2      60      70    +  e10  e21

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM