简体   繁体   中英

First and last occurence of a symbol (python without regex)

I am dealing with a string from 'ACGT' alphabet (a genetic sequence) padded by letters 'N' in the beginning and in the end:

NNN...NNACGT...GGCTAANNNN...NNN

I would like to find the positions where the actual sequence begins and ends. It could be easily done by using regular expressions, but I would like to have a simpler solution using basic python string operations. Your suggestions will be appreciated.

To get the remainder (removing padding from left and right) it seems like all you need is:

<YourString>.strip('N')

If you need to find indices maybe refer to lstrip and rstrip instead:

sStart = len(<YourString>)-len(<YourString>.lstrip('N'))+1
sEnd = len(<YourString>.rstrip('N'))

Since you mentioned you wanted to find the 'positions'. The code below will give you the positions where the actual sequence starts and ends in the string.

s = 'NNNNAANNNN'

i, j = s.find(next((x for x in s if x != 'N'), None)), s.rfind(next((x for x in reversed(s) if x != 'N'), None))

print(i, j)
print(s[i:j+1])

#Output
4 5
A A

Use strip()

    s = "NNNNNACGTGGCTAANNNNNNN"
    s = s.strip('N')
    print(s)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM