简体   繁体   English

Python搜索字符串以查找列表中任何项目的第一次出现

[英]Python to search a string for the first occurrence of any item in a list

I have to parse a few thousand txt documents using python, but right now I'm getting the code working for just one. 我必须使用python解析几千个txt文档,但是现在我得到的代码只适用于一个。

I am trying to find the first time any month (January, February, March, etc) appears in the document, and return the position of that first month. 我试图找到任何月份(1月,2月,3月等)第一次出现在文档中,并返回第一个月的位置。 Every document has at least one month in it, but some have many months. 每个文件至少有一个月,但有些文件有几个月。

This works currently, but seems very cumbersome: 这目前有效,但看起来非常麻烦:

mytext = open('2.txt','r')
mytext = mytext.read()

January = mytext.find("January")
February = mytext.find("February")
March = mytext.find("March")
April = mytext.find("April")
May = mytext.find("May")
June = mytext.find("June")
July = mytext.find("July")
August = mytext.find("August")
September = mytext.find("September")
October = mytext.find("October")
November = mytext.find("November")
December = mytext.find("December")

monthpos = [January, February, March, April, May, June, July, August, September, October, November, December]
monthpos = [x for x in monthpos if x != -1]
print min(monthpos)
 # returns the first match as a number

I would like to combine something like any() and find() to get the job done, but there doesn't seem like a better way to do this. 我想结合使用any()和find()之类的东西来完成工作,但似乎没有更好的方法来做到这一点。 I found this question but it isn't so clear, so it didn't help that much. 我发现了这个问题,但它不是那么清楚,所以它没有那么多帮助。 While I know this is wrong and does not work for many reasons, here is what I want to do: 虽然我知道这是错误的并且由于许多原因不起作用,但这就是我想要做的:

mytext = open('text.txt','r')
mytext = mytext.read()
months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]
print mytext.find(months) #where this would find the first time any month is matched
1945 # return the location in the string where the first month is found

Thanks in advance. 提前致谢。

I think this would do what you want: 我想这会做你想要的:

months = ["January", "February", "March", "April", 
          "May", "June", "July", "August", 
          "September", "October", "November", "December"]
indices = [s.find(month) for month in months]
first = min(index for index in indices if index > -1)

First, we get the first appearance of each month (or -1 if not present), then we get the minimum of the indices, except where it's -1 . 首先,我们得到每个月的第一次出现(如果不存在,则为-1 ),然后我们得到最小的索引,除非它是-1 This will throw a ValueError if none are found, which may or may not be what you want. 如果没有找到,则会抛出ValueError ,这可能是您想要的,也可能不是。


As Two-Bit Alchemist has commented, you could short-cut for efficiency: 正如Two-Bit Alchemist所评论的那样,你可以提高效率:

months = ["January", "February", "March", "April", 
          "May", "June", "July", "August", 
          "September", "October", "November", "December"]
first = None
for month in sorted(months, key=len):
    i = s[:first].find(month) # only search first part of string
    if i != -1:
        if i < first or first is None:
            first = i
        if i < len(month): # not enough room for any remaining months
            break

I would use re for conceptual simplicity. 我会使用re来概念化简单。 It's also easy to extend the code to do something more complex if you need to later. 如果您以后需要,还可以轻松扩展代码以执行更复杂的操作。

import re
mytext = open('text.txt','r')
mytext = mytext.read()
months = ["January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"]
months_match = re.search("|".join(months), mytext)
print match_obj.start()

http://docs.python.org/2/library/re.html http://docs.python.org/2/library/re.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM