簡體   English   中英

Python - 查找字符串中第一次出現的字符串列表的索引位置

[英]Python - find index position of first occurrence of a list of strings within a string

我想搜索一些文本,找到一組字符串的第一次出現的索引(比如“ - >”或“--x”或“--XX”)一旦找到,我需要知道從哪里開始找到的字符串的位置,以及找到的特定字符串(更具體地說是已識別字符串的長度)

這就是我到目前為止......但還不夠。 請幫忙。

arrowlist = {"->x","->","->>","-\","\\-","//--","->o","o\\--","<->","<->o"}
def cxn(line,arrowlist):
   if any(x in line for x in arrowlist):
      print("found an arrow {} at position {}".format(line.find(arrowlist),2))
   else:
      return 0 

也許正則表達式會更容易,但我真的很掙扎,因為箭頭列表可能是動態的,箭頭字符串的長度也可以變化。

謝謝!

我喜歡這個解決方案,靈感來自這篇文章:

如何在列表推導中使用重新匹配對象

import re

arrowlist = ["xxx->x", "->", "->>", "-\"","\\-"," // --","x->o", "-> ->"]

lines = ["xxx->x->->", "-> ->", "xxx->x", "xxxx->o"]

def filterPick(list,filter):
    return [(m.group(), item_number, m.start()) for item_number,l in enumerate(list) for m in (filter(l),) if m]


if __name__ == '__main__':

    searchRegex = re.compile(r''+ '|'.join(arrowlist) ).search
    x = filterPick(lines, searchRegex)
    print(x)

輸出顯示:

[('xxx->x', 0, 0), ('->', 1, 0), ('xxx->x', 2, 0), ('x->o', 3, 3)]

第一個數字是列表索引,第二個是字符串的起始索引。

繼你的例子的邏輯之后,這成為找到“第一個”匹配箭頭並打印它的位置的最便捷的方法。 但是,集合的順序不是FIFO,所以如果你想保留順序,我建議用一個列表代替一個箭頭列表,這樣就可以保留順序。

    arrowlist = {"->x","->", "->>", "-\\", "\\-","//--","->o","o\\--","<->","<->o"}
    def cxn(line, arrowlist):
       try:
           result = tuple((x, line.find(x)) for x in arrowlist if x in line)[0]
           print("found an arrow {} at position {} with length {}".format(result[0], result[1], len(result[0])))

       # Remember in general it's not a great idea to use an exception as
       # broad as Exception, this is just for example purposes.
       except Exception:
          return 0

如果您正在尋找提供的字符串(行)中的第一個匹配項,您可以這樣做:

arrowlist = {"->x","->", "->>", "-\\", "\\-","//--","->o","o\\--","<->","<->o"}

def cxn(line, arrowlist):
   try:
       # key first sorts on the position in string then shortest length 
       # to account for multiple arrow matches (i.e. -> and ->x)
       result = sorted([(x, line.find(x)) for x in arrowlist if x in line], key=lambda r: (r[1],len(r[0])))[0]
       # if you would like to match the "most complete" (i.e. longest-length) word first use:
       # result = sorted([(x, line.find(x)) for x in arrowlist if x in line], key=lambda r: (r[1], -len(r[0])))[0]
       print("found an arrow {} at position {} with length {}".format(result[0], result[1], len(result[0])))

   except Exception:
      return 0

或者,如果您可以訪問標准庫,則可以使用operator.itemgetter獲得幾乎相同的效果,並通過較少的函數調用獲得效率:

from operator import itemgetter

arrowlist = {"->x","->", "->>", "-\\", "\\-","//--","->o","o\\--","<->","<->o"}

def cxn(line, arrowlist):
   try:
       # key first sorts on the position in string then alphanumerically 
       # on the arrow match (i.e. -> and ->x matched in same position
       # will return -> because when sorted alphanumerically it is first)
       result = sorted([(x, line.find(x)) for x in arrowlist if x in line], key=(itemgetter(1,0)))[0]
       print("found an arrow {} at position {} with length {}".format(result[0], result[1], len(result[0])))

   except Exception:
      return 0

***注意:我使用的箭頭列表與您的示例略有不同,因為您提供的箭頭列表似乎正在弄亂默認代碼格式(可能是因為引用關閉問題)。 請記住,您可以在前面添加一個帶有'r'的字符串: r"Text that can use special symbols like the escape \\and\\ be read in as a 'raw' string literal\\" 有關原始字符串文字的更多信息, 請參閱此問題

你可以做點什么

count = 0
for item in arrowlist:
    count += 1
    if item in line:
        print("found an arrow {} at position {}".format(item,count))

想要發布我想出的答案(來自反饋的組合),你可以看到,這個結果 - 它是非常冗長和非常低效的將返回在正確的位置索引處找到的正確的箭頭字符串。 -

arrowlist = ["xxx->x", "->", "->>", "xxx->x","x->o", "xxx->"]
doc =""" @startuml
    n1 xxx->xx n2 : should not find
    n1 ->> n2 : must get the third arrow
    n2  xxx-> n3 : last item
    n3   -> n4 : second item
    n4    ->> n1 : third item"""

def checkForArrow(arrows,line):
    for a in arrows:
        words = line.split(' ')
        for word in words:
            if word == a:
                return(arrows.index(a),word,line.index(word))

for line in iter(doc.splitlines()):
    line = line.strip()
    if line != "":
        print (checkForArrow(arrowlist,line))

返回以下結果:(箭頭列表中的項目索引,找到的字符串,行中文本的索引位置)

None
None
(2, '->>', 3)
(5, 'xxx->', 4)
(1, '->', 5)
(2, '->>', 6)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM