I am dealing with a string from 'ACGT' alphabet (a genetic sequence) padded by letters 'N' in the beginning and in the end:
NNN...NNACGT...GGCTAANNNN...NNN
I would like to find the positions where the actual sequence begins and ends. It could be easily done by using regular expressions, but I would like to have a simpler solution using basic python string operations. Your suggestions will be appreciated.
To get the remainder (removing padding from left and right) it seems like all you need is:
<YourString>.strip('N')
If you need to find indices maybe refer to lstrip
and rstrip
instead:
sStart = len(<YourString>)-len(<YourString>.lstrip('N'))+1
sEnd = len(<YourString>.rstrip('N'))
Since you mentioned you wanted to find the 'positions'. The code below will give you the positions where the actual sequence starts and ends in the string.
s = 'NNNNAANNNN'
i, j = s.find(next((x for x in s if x != 'N'), None)), s.rfind(next((x for x in reversed(s) if x != 'N'), None))
print(i, j)
print(s[i:j+1])
#Output
4 5
A A
Use strip()
s = "NNNNNACGTGGCTAANNNNNNN"
s = s.strip('N')
print(s)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.