简体   繁体   中英

How to find position of letter in a string based on condition python

I want to find the index of the letter in the string satisfying a certain condition.I want to find the index of letter g if all the brackets before the letter are complete.

This is what I have

sen = 'abcd(fgji(l)jkpg((jgsdti))khgy)ghyig(a)gh'

This is what I have done

lst = [(i.end()) for i in re.finditer('g', sen)]
# lst
# [7, 16, 20, 29, 32, 36, 40]
count_open = 0
count_close = 0
for i in lst:
    sent=sen[0:i]
    for w in sent:
        if w == '(':
            count_open += 1
        if w == ')':
            count_close += 1    
        if count_open == count_close && count_open != 0:
            c = i-1
     break

It is giving me the c as 39, which is the last index, however the right answer should be 35 as the brackets before the second last g is complete.

You can dispense with regex and simply use a stack to keep track of whether or not your parens are balanced while you iterate over the characters:

In [4]: def find_balanced_gs(sen):
   ...:     stack = []
   ...:     for i, c in enumerate(sen):
   ...:         if c == "(":
   ...:             stack.append(c)
   ...:         elif c == ")":
   ...:             stack.pop()
   ...:         elif c == 'g':
   ...:             if len(stack) == 0:
   ...:                 yield i
   ...:

In [5]: list(find_balanced_gs(sen))
Out[5]: [31, 35, 39]

Using a stack here is the "classic" way of checking for balanced parans. It's been a while since I've implemented it from scratch, so there might be some edge cases that I haven't considered. But this should be a good start. I've made a generator, but you can make it a normal function that returns a list of indices, the first such index or the last such index.

Keeping your idea, just a few things were off, see comments:

import re

sen='abcd(fgji(l)jkpg((jgsdti))khgy)ghyig(a)gh'


lst=[ (i.end()) for i in re.finditer('g', sen)]
#lst
#[7, 16, 20, 29, 32, 36, 40]

for i in lst:
    # You have to reset the count for every i
    count_open= 0
    count_close=0
    sent=sen[0:i]
    for w in sent:
        if w=='(':
            count_open+=1
        if w==')':
            count_close+=1    
    # And iterate over all of sent before comparing the counts
    if count_open == count_close & count_open != 0:
        c=i-1
        break
print(c)
# 31 - actually the right answer, not 35

But this is not very efficient, as you iterate many times over the same parts of the string. You can make it more efficient, iterating only once over the string:

sen='abcd(fgji(l)jkpg((jgsdti))khgy)ghyig(a)gh'

def find(letter, string):
    count_open = 0
    count_close = 0
    for (index, char) in enumerate(sen):
        if char == '(':
            count_open += 1
        elif char == ')':
            count_close += 1
        elif char == letter and count_close == count_open and count_open > 0:
            return index
    else:
        raise ValueError('letter not found')

find('g', sen)
# 31
find('a', sen)
# ...
# ValueError: letter not found

@Thierry Lathuille's answer is perfectly good. Here I'm just suggesting some minor variations without claiming they are better:

out = []    # collect all valid 'g'
ocount = 0  # only store the difference between open and closed
for m in re.finditer('[\(\)g]', sen):   # use re to preselect
    L = m.group()
    ocount += {'(':1, ')':-1, 'g':0}[L] # save a bit of typing
    assert ocount >= 0                  # enforce some grammar if you like
    if L == 'g' and ocount == 0:
        out.append(m.start())

out
# [31, 35, 39]

This is a simpler adoption of the code in the OP (and takes into account the condition count_open != 0 ):

def get_idx(f, sen):
    idx = []
    count_open= 0
    count_close=0

    for i, w in enumerate(sen):
        if w == '(':
            count_open += 1
        if w == ')':
            count_close += 1    
        if count_open == count_close & count_open != 0:
            if w == f:
                idx.append(i)

    return idx

get_idx('g', sen)

Out:

[31, 35, 39]

You can use .index() to find the index of a string or element within a string or list.

Put the stringvar.index(string) this will give you the offset or index of string.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM