简体   繁体   English

Python - 查找字符串中第一次出现的字符串列表的索引位置

[英]Python - find index position of first occurrence of a list of strings within a string

I would like to search some text for the index of the first occurrence of a set of strings (say "-->" or "--x" or "--XX") once found, I would need to know where the start position of the found string, and the particular string that was found (more specifically the length of the identified string) 我想搜索一些文本,找到一组字符串的第一次出现的索引(比如“ - >”或“--x”或“--XX”)一旦找到,我需要知道从哪里开始找到的字符串的位置,以及找到的特定字符串(更具体地说是已识别字符串的长度)

This is what i have so far.. but its not enough. 这就是我到目前为止......但还不够。 Please help. 请帮忙。

arrowlist = {"->x","->","->>","-\","\\-","//--","->o","o\\--","<->","<->o"}
def cxn(line,arrowlist):
   if any(x in line for x in arrowlist):
      print("found an arrow {} at position {}".format(line.find(arrowlist),2))
   else:
      return 0 

maybe regex would be easier, but i'm really struggling since the arrow list could be dynamic and the length of the arrow strings could also be variable. 也许正则表达式会更容易,但我真的很挣扎,因为箭头列表可能是动态的,箭头字符串的长度也可以变化。

Thanks! 谢谢!

I like this solution, inspired from this post: 我喜欢这个解决方案,灵感来自这篇文章:

How to use re match objects in a list comprehension 如何在列表推导中使用重新匹配对象

import re

arrowlist = ["xxx->x", "->", "->>", "-\"","\\-"," // --","x->o", "-> ->"]

lines = ["xxx->x->->", "-> ->", "xxx->x", "xxxx->o"]

def filterPick(list,filter):
    return [(m.group(), item_number, m.start()) for item_number,l in enumerate(list) for m in (filter(l),) if m]


if __name__ == '__main__':

    searchRegex = re.compile(r''+ '|'.join(arrowlist) ).search
    x = filterPick(lines, searchRegex)
    print(x)

Output shows: 输出显示:

[('xxx->x', 0, 0), ('->', 1, 0), ('xxx->x', 2, 0), ('x->o', 3, 3)]

First number being the list index and second the start index of the string. 第一个数字是列表索引,第二个是字符串的起始索引。

Following along with your example's logic, this jumped out as the most expedient method of finding the "first" matching arrow and printing it's location. 继你的例子的逻辑之后,这成为找到“第一个”匹配箭头并打印它的位置的最便捷的方法。 However, the order of sets are not FIFO, so if you want to preserve order I would suggest substituting a list instead of a set for arrowlist so that the order can be preserved. 但是,集合的顺序不是FIFO,所以如果你想保留顺序,我建议用一个列表代替一个箭头列表,这样就可以保留顺序。

    arrowlist = {"->x","->", "->>", "-\\", "\\-","//--","->o","o\\--","<->","<->o"}
    def cxn(line, arrowlist):
       try:
           result = tuple((x, line.find(x)) for x in arrowlist if x in line)[0]
           print("found an arrow {} at position {} with length {}".format(result[0], result[1], len(result[0])))

       # Remember in general it's not a great idea to use an exception as
       # broad as Exception, this is just for example purposes.
       except Exception:
          return 0

If you're looking for the first match in the provided string (line), you can do that like this: 如果您正在寻找提供的字符串(行)中的第一个匹配项,您可以这样做:

arrowlist = {"->x","->", "->>", "-\\", "\\-","//--","->o","o\\--","<->","<->o"}

def cxn(line, arrowlist):
   try:
       # key first sorts on the position in string then shortest length 
       # to account for multiple arrow matches (i.e. -> and ->x)
       result = sorted([(x, line.find(x)) for x in arrowlist if x in line], key=lambda r: (r[1],len(r[0])))[0]
       # if you would like to match the "most complete" (i.e. longest-length) word first use:
       # result = sorted([(x, line.find(x)) for x in arrowlist if x in line], key=lambda r: (r[1], -len(r[0])))[0]
       print("found an arrow {} at position {} with length {}".format(result[0], result[1], len(result[0])))

   except Exception:
      return 0

Or, if you have access to the standard library you can use operator.itemgetter to almost the same effect and gain efficiency from less function calls: 或者,如果您可以访问标准库,则可以使用operator.itemgetter获得几乎相同的效果,并通过较少的函数调用获得效率:

from operator import itemgetter

arrowlist = {"->x","->", "->>", "-\\", "\\-","//--","->o","o\\--","<->","<->o"}

def cxn(line, arrowlist):
   try:
       # key first sorts on the position in string then alphanumerically 
       # on the arrow match (i.e. -> and ->x matched in same position
       # will return -> because when sorted alphanumerically it is first)
       result = sorted([(x, line.find(x)) for x in arrowlist if x in line], key=(itemgetter(1,0)))[0]
       print("found an arrow {} at position {} with length {}".format(result[0], result[1], len(result[0])))

   except Exception:
      return 0

***NOTE: I am using a slightly different arrowlist than your example just because the one you provided seems to be messing with the default code formatting (likely because of quote closure issues). ***注意:我使用的箭头列表与您的示例略有不同,因为您提供的箭头列表似乎正在弄乱默认代码格式(可能是因为引用关闭问题)。 Remember you can prepend a string with 'r' like this: r"Text that can use special symbols like the escape \\and\\ be read in as a 'raw' string literal\\" . 请记住,您可以在前面添加一个带有'r'的字符串: r"Text that can use special symbols like the escape \\and\\ be read in as a 'raw' string literal\\" See this question for more information about raw string literals. 有关原始字符串文字的更多信息, 请参阅此问题

You could do something like 你可以做点什么

count = 0
for item in arrowlist:
    count += 1
    if item in line:
        print("found an arrow {} at position {}".format(item,count))

wanted to post the answer that I came up with (from the combination of feedback) as you can see, this result -- be it really verbose and very inefficient will return the correct arrow string found at the correct position index. 想要发布我想出的答案(来自反馈的组合),你可以看到,这个结果 - 它是非常冗长和非常低效的将返回在正确的位置索引处找到的正确的箭头字符串。 -- -

arrowlist = ["xxx->x", "->", "->>", "xxx->x","x->o", "xxx->"]
doc =""" @startuml
    n1 xxx->xx n2 : should not find
    n1 ->> n2 : must get the third arrow
    n2  xxx-> n3 : last item
    n3   -> n4 : second item
    n4    ->> n1 : third item"""

def checkForArrow(arrows,line):
    for a in arrows:
        words = line.split(' ')
        for word in words:
            if word == a:
                return(arrows.index(a),word,line.index(word))

for line in iter(doc.splitlines()):
    line = line.strip()
    if line != "":
        print (checkForArrow(arrowlist,line))

returns the following results: (index of item in arrowlist, the string found, index position of text in the line) 返回以下结果:(箭头列表中的项目索引,找到的字符串,行中文本的索引位置)

None
None
(2, '->>', 3)
(5, 'xxx->', 4)
(1, '->', 5)
(2, '->>', 6)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python:找到第一个x的索引,然后找到第二个x的索引 - Python: Find index of the first occurrence of x and then the index of the second occurrence of x Python - 在字符串中查找字符串列表的出现 - Python - find occurrences of list of strings within string 在排序列表中查找首次出现的索引 - Find index of first occurrence in sorted list Python在索引后找到第一个出现的字符 - Python find first occurrence of character after index 查找列表中字符串第二次出现的索引 - Find the index of the second occurrence of a string inside a list 列表:查找第一个索引并计算列表列表中特定列表的出现次数 - List: Find the first index and count the occurrence of a specific list in list of lists 如何从python中另一个字符串的列表中找到字符串的首次出现 - How can I find a first occurrence of a string from a list in another string in python 查找熊猫数据框中首次出现的特定部分字符串的索引位置 - Find index location of first occurrence of a specific partial string in pandas dataframe 在DataFrame中查找第一次出现的索引 - Find the index of first occurrence in DataFrame 找到以下内容:列表中存储的任何一个子串(以先到者为准); 在Python中更大的字符串中 - Find the occurrence of: any one of the substrings (whichever first) stored in a list; in a bigger string in Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM