In my code, what I'm trying to do is clean up a FastA file by only including the letters A,C,T,G,N, and U in the output string. I'm trying to do this through a regular expression, which looks like this:
newFastA = (re.findall(r'A,T,G,C,U,N',self.fastAsequence)) #trying to extract all of the listed bases from my fastAsequence.
print (newFastA)
However, I am not getting all the occurences of the bases in order. I think the format of my regular expression is incorrect, so if you could let me know what mistake I've made, that would be great.
I'd avoid regex entirely. You can use str.translate
to remove the characters you don't want.
from string import ascii_letters
removechars = ''.join(set(ascii_letters) - set('ACTGNU'))
newFastA = self.fastAsequence.translate(None, removechars)
demo:
dna = 'ACTAGAGAUACCACG this will be removed GNUGNUGNU'
dna.translate(None, removechars)
Out[6]: 'ACTAGAGAUACCACG GNUGNUGNU'
If you want to remove whitespace too, you can toss string.whitespace
into removechars
.
Sidenote, the above only works in python 2, in python 3 there's an additional step:
from string import ascii_letters, punctuation, whitespace
#showing how to remove whitespace and punctuation too in this example
removechars = ''.join(set(ascii_letters + punctuation + whitespace) - set('ACTGNU'))
trans = str.maketrans('', '', removechars)
dna.translate(trans)
Out[11]: 'ACTAGAGAUACCACGGNUGNUGNU'
print re.sub("[^ACTGNU]","",fastA_string)
to go with the million other answers youll get
or without re
print "".join(filter(lambda character:character in set("ACTGUN"),fastA_string)
You need to use a character set.
re.findall(r"[ATGCUN]", self.fastAsequence)
Your code looks for a LITERAL "A,T,G,C,U,N"
, and outputs all occurrences of that. Character sets in regex allow for a search of the type: "Any of the following: A
, T
, G
, C
, U
, N
" rather than "The following: A,T,G,C,U,N
"
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.