简体   繁体   中英

How to select rows with certain value between 2 columns from another DataFrame in pandas?

For example, I have 2 Frames, the first one is the one I want to select rows from, the second one contains the creteria for selection.

df1 = pd.DataFrame({'chr': {0: 7, 1: 7, 2: 7, 3: 7, 4: 7, 5: 7, 6: 7},
 0: {0: 55241686,
  1: 55242415,
  2: 55248986,
  3: 55259412,
  4: 55260459,
  5: 55266410,
  6: 55268009},
 1: {0: 55241736,
  1: 55242513,
  2: 55249171,
  3: 55259567,
  4: 55260534,
  5: 55266556,
  6: 55268064}})

df1

df2 = pd.DataFrame({'chr': {0: 7,
  1: 7,
  2: 7,
  3: 7,
  4: 7,
  5: 7,
  6: 7,
  7: 7,
  8: 7,
  9: 7,
  10: 7,
  11: 7,
  12: 7,
  13: 7,
  14: 7,
  15: 7,
  16: 7,
  17: 7,
  18: 7,
  19: 7},
 's': {0: 55241646,
  1: 55241658,
  2: 55241690,
  3: 55241718,
  4: 55241721,
  5: 55241722,
  6: 55241727,
  7: 55241732,
  8: 55242454,
  9: 55242457,
  10: 55242488,
  11: 55242511,
  12: 55248991,
  13: 55248995,
  14: 55248995,
  15: 55249000,
  16: 55249022,
  17: 55249036,
  18: 55249053,
  19: 55249057},
 'e': {0: 55241646,
  1: 55241658,
  2: 55241690,
  3: 55241718,
  4: 55241721,
  5: 55241722,
  6: 55241727,
  7: 55241732,
  8: 55242454,
  9: 55242457,
  10: 55242488,
  11: 55242511,
  12: 55248991,
  13: 55248995,
  14: 55248995,
  15: 55249000,
  16: 55249022,
  17: 55249036,
  18: 55249053,
  19: 55249057},
 'ref': {0: 'T',
  1: 'T',
  2: 'A',
  3: 'G',
  4: 'C',
  5: 'G',
  6: 'G',
  7: 'A',
  8: 'G',
  9: 'G',
  10: 'C',
  11: 'G',
  12: 'C',
  13: 'G',
  14: 'G',
  15: 'G',
  16: 'G',
  17: 'G',
  18: 'C',
  19: 'C'},
 'alt': {0: 'C',
  1: 'G',
  2: 'C',
  3: 'A',
  4: 'T',
  5: 'A',
  6: 'A',
  7: 'G',
  8: 'A',
  9: 'A',
  10: 'T',
  11: 'A',
  12: 'G',
  13: 'A',
  14: 'C',
  15: 'A',
  16: 'C',
  17: 'A',
  18: 'G',
  19: 'T'}})

df2 here only shows a small part.

df2

what I want to achieve is

for each row in df1, if this row(row_df1) match with certain row in df2 (row_df2) (match means, row_df1['chr']==row_df2['chr'] & row_df1[0] >= row_df2['s'] & row_df1 1 <= row_df2['e']

in brief,

if the value is fall into certain intervals constructed by df2['s'] and df2['e'], return it.

I believe best case scenario for you is to merge both dataframes first using a common column. In your case "chr". For example as I understand you want all 'chr' from df1 which exist df2, so in that case you just do:

merged_df = df1.merge(df2, on='chr', how='left') 

In merge you can use "indicator=True" which will create a new column called "_merge" for you which will indicate the source of each row.

Now when you have your data merged on you can make simple condition statements to get all the needed columns like:

merged_df.loc[(merged_df[0] >= merged_df['s']) & (merged_df[1] >= merged_df ['e'])]

Or you could add a new column as a result, using apply and etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM