简体   繁体   中英

Pandas: How to check if any of a list in a dataframe column is present in a range in another dataframe?

I'm trying to compare two bioinformatic DataFrames (one with transcription start and end genomic locations, and one with expression data). I need to check if any of a list of locations in one DataFrame is present within ranges defined by the start and end locations in the other DataFrame, returning rows/ids where they match.

I have tried a number of built-in methods (.isin, .where, .query,), but usually get stuck because the lists are nonhashable. I've also tried a nested for loop with iterrows and itertuples, which is exceedingly slow (my actual datasets are thousands of entries).

tss_df = pd.DataFrame(data={'id':['gene1','gene2'], 
   'locs':[[21,23],[34,39]]})
exp_df = pd.DataFrame(data={'gene':['geneA','geneB'], 
   'start': [15,31], 'end': [25,42]})

I'm looking to find that the row with id 'gene1' in tss_df has locations (locs) that match 'geneA' in exp_df.

The output would be something like:

output = pd.DataFrame(data={'id':['gene1','gene2'],
   'locs': [[21,23],[34,39]],
   'match': ['geneA','geneB']})

Edit: Based on a comment below, I tried playing with merge_asof :

pd.merge_asof(tss_df,exp_df,left_on='locs',right_on='start')

This gave me an incompatible merge keys error, I suspect because I'm comparing a list to integer; so I split out the first value in locs:

tss_df['loc1'] = tss_df['locs'][0]
pd.merge_asof(tss_df,exp_df,left_on='loc1',right_on='start')

This appears to have worked for my test data, but I'll need to try it with my actual data!

Based on a comment below, I tried playing with merge_asof :

pd.merge_asof(tss_df,exp_df,left_on='locs',right_on='start')

This gave me an incompatible merge keys error, I suspect because I'm comparing a list to integer; so I split out the first value in locs:

tss_df['loc1'] = tss_df['locs'][0]
pd.merge_asof(tss_df,exp_df,left_on='loc1',right_on='start')

This appears to have worked for my test data!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM