First and last occurence of a symbol (python without regex)

Question

I am dealing with a string from 'ACGT' alphabet (a genetic sequence) padded by letters 'N' in the beginning and in the end:

NNN...NNACGT...GGCTAANNNN...NNN

I would like to find the positions where the actual sequence begins and ends. It could be easily done by using regular expressions, but I would like to have a simpler solution using basic python string operations. Your suggestions will be appreciated.

Answer 1

To get the remainder (removing padding from left and right) it seems like all you need is:

<YourString>.strip('N')

If you need to find indices maybe refer to lstrip and rstrip instead:

sStart = len(<YourString>)-len(<YourString>.lstrip('N'))+1
sEnd = len(<YourString>.rstrip('N'))

Answer 2

Since you mentioned you wanted to find the 'positions'. The code below will give you the positions where the actual sequence starts and ends in the string.

s = 'NNNNAANNNN'

i, j = s.find(next((x for x in s if x != 'N'), None)), s.rfind(next((x for x in reversed(s) if x != 'N'), None))

print(i, j)
print(s[i:j+1])

#Output
4 5
A A

Answer 3

Use strip()

    s = "NNNNNACGTGGCTAANNNNNNN"
    s = s.strip('N')
    print(s)

First and last occurence of a symbol (python without regex)

Question

3 answers

solution1
3 ACCPTED 2020-06-04 12:18:07

solution2
1 2020-06-04 12:18:44

solution3
0 2020-06-04 12:25:19

First and last occurence of a symbol (python without regex)

Question

3 answers

solution1 3 ACCPTED 2020-06-04 12:18:07

solution2 1 2020-06-04 12:18:44

solution3 0 2020-06-04 12:25:19

solution1
3 ACCPTED 2020-06-04 12:18:07

solution2
1 2020-06-04 12:18:44

solution3
0 2020-06-04 12:25:19