简体   繁体   English

正则表达式-字符串不包含特定字符

[英]Regular expression - such that a string does not contain specific characters

I want a regular expression that would check if a string contains any character apart from "A" , "G", "C" , "U" e the string would be like ggggugcccgcuagagagacagu 我想要一个正则表达式,它可以检查字符串是否包含除“ A”,“ G”,“ C”,“ U”之外的任何字符,否则该字符串将类似于ggggugcccgcuagagagacagu

i want regex to check if it containns only these , it is not case sensitive. 我希望正则表达式检查是否仅包含这些,所以不区分大小写。

what i tried 我尝试过的

match= re.match(r'[^GaAgUuCc]',seq2)

It is to find non RNA characters in a RNA sequence 在RNA序列中发现非RNA特征

Use re.search instead: 使用re.search代替:

>>> re.search(r'[^GAUC]', 'acg', re.I)
>>> re.search(r'[^GAUC]', 'acgf', re.I)
<_sre.SRE_Match object at 0x7f1b6a9e32a0>

re.I makes the regex case-insensitive. re.I使正则表达式不区分大小写。

A faster way to do it would be to use sets to check if the set of characters is a subset of your allowed characters: 一种更快的方法是使用集合来检查字符集是否是允许的字符的子集:

>>> set('acg'.upper()) <= set('GAUC')
True
>>> set('acgs'.upper()) <= set('GAUC')
False

You need to use a quantifier with your regex to match more characters: - 您需要在正则表达式中使用量词以匹配更多字符:-

>>> match = re.search("[^GAUC]+","ggggugcccgcuagrrragagacagu", re.I)
>>> match
9: <_sre.SRE_Match object at 0x01BCA8A8>
>>> match.group()
10: 'rrr'

You should use re.search() or re.findall() rather than re.match() : 您应该使用re.search()re.findall()而不是re.match()

In [9]: seq2 = 'ggggugcccQgcuagagaZgacagu'

In [10]: re.findall(r'[^GaAgUuCc]',seq2)
Out[10]: ['Q', 'Z']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM