简体   繁体   中英

Python - find index position of first occurrence of a list of strings within a string

I would like to search some text for the index of the first occurrence of a set of strings (say "-->" or "--x" or "--XX") once found, I would need to know where the start position of the found string, and the particular string that was found (more specifically the length of the identified string)

This is what i have so far.. but its not enough. Please help.

arrowlist = {"->x","->","->>","-\","\\-","//--","->o","o\\--","<->","<->o"}
def cxn(line,arrowlist):
   if any(x in line for x in arrowlist):
      print("found an arrow {} at position {}".format(line.find(arrowlist),2))
   else:
      return 0 

maybe regex would be easier, but i'm really struggling since the arrow list could be dynamic and the length of the arrow strings could also be variable.

Thanks!

I like this solution, inspired from this post:

How to use re match objects in a list comprehension

import re

arrowlist = ["xxx->x", "->", "->>", "-\"","\\-"," // --","x->o", "-> ->"]

lines = ["xxx->x->->", "-> ->", "xxx->x", "xxxx->o"]

def filterPick(list,filter):
    return [(m.group(), item_number, m.start()) for item_number,l in enumerate(list) for m in (filter(l),) if m]


if __name__ == '__main__':

    searchRegex = re.compile(r''+ '|'.join(arrowlist) ).search
    x = filterPick(lines, searchRegex)
    print(x)

Output shows:

[('xxx->x', 0, 0), ('->', 1, 0), ('xxx->x', 2, 0), ('x->o', 3, 3)]

First number being the list index and second the start index of the string.

Following along with your example's logic, this jumped out as the most expedient method of finding the "first" matching arrow and printing it's location. However, the order of sets are not FIFO, so if you want to preserve order I would suggest substituting a list instead of a set for arrowlist so that the order can be preserved.

    arrowlist = {"->x","->", "->>", "-\\", "\\-","//--","->o","o\\--","<->","<->o"}
    def cxn(line, arrowlist):
       try:
           result = tuple((x, line.find(x)) for x in arrowlist if x in line)[0]
           print("found an arrow {} at position {} with length {}".format(result[0], result[1], len(result[0])))

       # Remember in general it's not a great idea to use an exception as
       # broad as Exception, this is just for example purposes.
       except Exception:
          return 0

If you're looking for the first match in the provided string (line), you can do that like this:

arrowlist = {"->x","->", "->>", "-\\", "\\-","//--","->o","o\\--","<->","<->o"}

def cxn(line, arrowlist):
   try:
       # key first sorts on the position in string then shortest length 
       # to account for multiple arrow matches (i.e. -> and ->x)
       result = sorted([(x, line.find(x)) for x in arrowlist if x in line], key=lambda r: (r[1],len(r[0])))[0]
       # if you would like to match the "most complete" (i.e. longest-length) word first use:
       # result = sorted([(x, line.find(x)) for x in arrowlist if x in line], key=lambda r: (r[1], -len(r[0])))[0]
       print("found an arrow {} at position {} with length {}".format(result[0], result[1], len(result[0])))

   except Exception:
      return 0

Or, if you have access to the standard library you can use operator.itemgetter to almost the same effect and gain efficiency from less function calls:

from operator import itemgetter

arrowlist = {"->x","->", "->>", "-\\", "\\-","//--","->o","o\\--","<->","<->o"}

def cxn(line, arrowlist):
   try:
       # key first sorts on the position in string then alphanumerically 
       # on the arrow match (i.e. -> and ->x matched in same position
       # will return -> because when sorted alphanumerically it is first)
       result = sorted([(x, line.find(x)) for x in arrowlist if x in line], key=(itemgetter(1,0)))[0]
       print("found an arrow {} at position {} with length {}".format(result[0], result[1], len(result[0])))

   except Exception:
      return 0

***NOTE: I am using a slightly different arrowlist than your example just because the one you provided seems to be messing with the default code formatting (likely because of quote closure issues). Remember you can prepend a string with 'r' like this: r"Text that can use special symbols like the escape \\and\\ be read in as a 'raw' string literal\\" . See this question for more information about raw string literals.

You could do something like

count = 0
for item in arrowlist:
    count += 1
    if item in line:
        print("found an arrow {} at position {}".format(item,count))

wanted to post the answer that I came up with (from the combination of feedback) as you can see, this result -- be it really verbose and very inefficient will return the correct arrow string found at the correct position index. --

arrowlist = ["xxx->x", "->", "->>", "xxx->x","x->o", "xxx->"]
doc =""" @startuml
    n1 xxx->xx n2 : should not find
    n1 ->> n2 : must get the third arrow
    n2  xxx-> n3 : last item
    n3   -> n4 : second item
    n4    ->> n1 : third item"""

def checkForArrow(arrows,line):
    for a in arrows:
        words = line.split(' ')
        for word in words:
            if word == a:
                return(arrows.index(a),word,line.index(word))

for line in iter(doc.splitlines()):
    line = line.strip()
    if line != "":
        print (checkForArrow(arrowlist,line))

returns the following results: (index of item in arrowlist, the string found, index position of text in the line)

None
None
(2, '->>', 3)
(5, 'xxx->', 4)
(1, '->', 5)
(2, '->>', 6)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM