简体   繁体   English

Pandas:如何检查数据框列中的任何列表是否存在于另一个数据帧的范围内?

[英]Pandas: How to check if any of a list in a dataframe column is present in a range in another dataframe?

I'm trying to compare two bioinformatic DataFrames (one with transcription start and end genomic locations, and one with expression data). 我正在尝试比较两个生物信息学DataFrames(一个具有转录起始和最终基因组位置,一个具有表达数据)。 I need to check if any of a list of locations in one DataFrame is present within ranges defined by the start and end locations in the other DataFrame, returning rows/ids where they match. 我需要检查一个DataFrame中的任何位置列表是否存在于另一个DataFrame中的起始位置和结束位置定义的范围内,返回它们匹配的行/ ID。

I have tried a number of built-in methods (.isin, .where, .query,), but usually get stuck because the lists are nonhashable. 我已经尝试了许多内置方法(.isin,.where,.query,),但通常会因为列表不可用而卡住。 I've also tried a nested for loop with iterrows and itertuples, which is exceedingly slow (my actual datasets are thousands of entries). 我还尝试了一个带有iterrows和itertuples的嵌套for循环,这非常慢(我的实际数据集是数千个条目)。

tss_df = pd.DataFrame(data={'id':['gene1','gene2'], 
   'locs':[[21,23],[34,39]]})
exp_df = pd.DataFrame(data={'gene':['geneA','geneB'], 
   'start': [15,31], 'end': [25,42]})

I'm looking to find that the row with id 'gene1' in tss_df has locations (locs) that match 'geneA' in exp_df. 我想找到tss_df中id为'gene1'的行的位置(locs)与exp_df中的'geneA'匹配。

The output would be something like: 输出将是这样的:

output = pd.DataFrame(data={'id':['gene1','gene2'],
   'locs': [[21,23],[34,39]],
   'match': ['geneA','geneB']})

Edit: Based on a comment below, I tried playing with merge_asof : 编辑:根据下面的评论,我尝试使用merge_asof

pd.merge_asof(tss_df,exp_df,left_on='locs',right_on='start')

This gave me an incompatible merge keys error, I suspect because I'm comparing a list to integer; 这给了我一个不兼容的合并键错误,我怀疑是因为我将列表与整数进行比较; so I split out the first value in locs: 所以我拆分了locs中的第一个值:

tss_df['loc1'] = tss_df['locs'][0]
pd.merge_asof(tss_df,exp_df,left_on='loc1',right_on='start')

This appears to have worked for my test data, but I'll need to try it with my actual data! 这似乎适用于我的测试数据,但我需要尝试使用我的实际数据!

Based on a comment below, I tried playing with merge_asof : 根据下面的评论,我尝试使用merge_asof

pd.merge_asof(tss_df,exp_df,left_on='locs',right_on='start')

This gave me an incompatible merge keys error, I suspect because I'm comparing a list to integer; 这给了我一个不兼容的合并键错误,我怀疑是因为我将列表与整数进行比较; so I split out the first value in locs: 所以我拆分了locs中的第一个值:

tss_df['loc1'] = tss_df['locs'][0]
pd.merge_asof(tss_df,exp_df,left_on='loc1',right_on='start')

This appears to have worked for my test data! 这似乎适用于我的测试数据!

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何检查 dataframe pandas 中是否不存在列列表 - how to check if a list of column in not present in a dataframe pandas 检查数据框中的关键字是否存在于另一个数据框列中 - Check if keyword in dataframe is present in another dataframe column 检查pandas数据帧中的列值是否存在于系列中 - Check if a column value in a pandas dataframe is present in a series 检查是否有任何 pandas dataframe 列值在另一个 pandas Z6A8064B5DF47945550DZ5列53C4 - Check if any pandas dataframe column values are within another pandas dataframe column 替换 pandas dataframe 列中的元素(如果存在于另一个 dataframe 列中) - Replace an element in a pandas dataframe column if present in another dataframe column Pandas 将数据帧列中的列表与另一个数据帧合并 - Pandas merge a list in a dataframe column with another dataframe 对数据框熊猫的列中存在的列表执行计算 - Perform calculations on a list present in a column of a dataframe pandas 如何检查Python中的列表中是否存在DataFrame字符串列的第一个单词? - How to check if first word of a DataFrame string column is present in a List in Python? Python pandas 数据框检查一列的值是否在另一个列表中 - Python pandas dataframe check if values of one column is in another list 如何检查 pandas 列中的字符串列表的元素是否存在于另一列中 - How to check if elements of a list of strings in a pandas column are present in another column
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM