python搜索一組單詞

Question

簡單來說，我正在尋找使用正則表達式而不使用for循環在字符串中搜索一組單詞的最快方法。 即有一種方法可以做到這一點：

text = 'asdfadfgargqerno_TP53_dfgnafoqwefe_ATM_cvafukyhfjakhdfialb'
genes = set(['TP53','ATM','BRCA2'])
mutations = 0
if re.search( genes, text):
    mutations += 1
print mutations 
>>>1

這樣做的原因是因為我正在搜索復雜的數據結構，並且不想嵌套太多的循環。 以下是問題代碼的詳細信息：

genes = set(['TP53','ATM','BRCA2'])
single_gene = 'ATM'
mutations = 0
data_dict = {
             sample1=set(['AAA','BBB','TP53'])
             sample2=set(['AAA','ATM','TP53'])
             sample3=set(['AAA','CCC','XXX'])
             sample4=set(['AAA','ZZZ','BRCA2'])
            }

for sample in data_dict:
    for gene in data_dict[sample] 
        if re.search( single_gene, gene):
            mutations += 1
            break

我可以輕松地搜索“ single_gene”，但是我想搜索“ genes”。 如果我添加另一個for循環以遍歷'genes'，那么代碼將變得更加復雜，因為我將不得不添加另一個'break'和一個布爾值來控制何時發生中斷？ 從功能上來說，它可以工作，但是笨拙，必須有一種更優雅的方法嗎？ 請參閱下面的集合的笨拙額外循環（當前是我唯一的解決方案）：

for sample in data_dict:
    for gene in data_dict[sample] 
        MUT = False
        for mut in genes:
            if re.search( mut, gene):
                mutations += 1
                MUT = True
                break
        if MUT == True:
            break

重要提示：如果每個樣本的集合中都出現了來自“基因”的任何基因，我只想在“突變”中添加0或1。 即'sample2'將為突變加1，而樣本3將加0。請告知是否需要進一步說明。 提前致謝！

Answer 1

如果目標字符串是固定文本（即非正則表達式），則不要使用re 。 效率更高：

for gene in genes:
    if gene in text:
        print('True')

該主題有很多變化，例如：

if [gene for gene in genes if gene in text]:
    ...

它很容易混淆，包含一個列表理解，並且依靠在Python中將空列表[]視為錯誤的事實。

已更新，因為問題已更改：

您仍在努力進行。 考慮改為：

def find_any_gene(genes, text):
    """Returns True if any of the subsequences in genes
       is found within text.
    """
    for gene in genes:
        if gene in text:
           return True
    return False

mutations = 0
text = '...'

for sample in data_dict:
    for genes in data_dict[sample]
         if find_any_gene(genes, text):
             mutations += 1

這具有以下優點：縮短搜索所需的代碼更少，可讀性更高，並且函數find_any_gene()可以由其他代碼調用。

Answer 2

這樣行嗎？ 我從評論中使用了一些例子。

讓我知道我是否接近嗎？

genes = set(['TP53','ATM','BRCA2', 'aaC', 'CDH'])
mutations = 0
data_dict = {
             "sample1":set(['AAA','BBB','TP53']),
             "sample2":set(['AAA','ATM','TP53']),
             "sample3":set(['AAA','CCC','XXX']),
             "sample4":set(['123CDH47aaCDHzz','ZZZ','BRCA2'])
            }

for sample in data_dict:
    for gene in data_dict[sample]:
        if [ mut for mut in genes if mut in gene ]:
            print "Found mutation: "+str(gene),
            print "in sample: "+str(data_dict[sample])
            mutations += 1

print mutations

python搜索一組單詞

問題描述

2 個解決方案

解決方案1
1 已采納 2015-09-01 14:48:39

解決方案2
0 2015-09-01 15:49:21

python搜索一組單詞

問題描述

2 個解決方案

解決方案1 1 已采納 2015-09-01 14:48:39

解決方案2 0 2015-09-01 15:49:21

解決方案1
1 已采納 2015-09-01 14:48:39

解決方案2
0 2015-09-01 15:49:21