簡體   English   中英

Python:如何使用關鍵字列表在文本中搜索字符串

[英]Python: How to use list of keywords to search for a string in a text

所以我正在編寫一個循環多個.txt 文件並搜索任意數量的預先指定的關鍵字的程序。 我很難找到一種方法來通過要搜索的關鍵字列表。

下面的代碼當前返回以下錯誤:

TypeError: 'in <string>' requires string as left operand, not list

我知道該錯誤是由關鍵字列表引起的,但我不知道如何在不運行此錯誤的情況下輸入大量關鍵字。

當前代碼:

from os import listdir

keywords=['Example', 'Use', 'Of', 'Keywords']
 
with open("/home/user/folder/project/result.txt", "w") as f:
    for filename in listdir("/home/user/folder/project/data"):
        with open('/home/user/folder/project/data/' + filename) as currentFile:
            text = currentFile.read()
            #Error Below
            if (keywords in text):
                f.write('Keyword found in ' + filename[:-4] + '\n')
            else:
                f.write('No keyword in ' + filename[:-4] + '\n')

錯誤在注釋部分下的上述代碼的第 10 行中指示。 我不確定為什么我不能調用列表來搜索關鍵字。 任何幫助表示贊賞,謝謝!

嘗試遍歷列表以查看每個元素是否在文本中

for i in range(0, len(keywords)):
    if keywords[i] in text:
        f.write('Keyword found in ' + filename[:-4] + '\n')
        break
    else:
        f.write('No keyword in ' + filename[:-4] + '\n')
        break

您也不能使用in查看列表是否在字符串中

我會使用正則表達式,因為它們是專門為在文本中搜索子字符串而構建的。

您只需要re.search塊。 我添加了findallfinditer的示例來揭開它們的神秘面紗。

# lets pretend these 4 sentences in `text` are 4 different files
text = '''Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum'''.split(sep='. ')

# add more keywords
keywords=[r'publishing', r'industry']
regex = '|'.join(keywords)
import re
for t in text:
    lst = re.findall(regex, t, re.I) # re.I make case-insensitive
    for el in lst:
        print(el)

    iterator = re.finditer(regex, t, re.I)
    for el in iterator:
        print(el.span())

    if re.search(regex, t, re.I):
        print('Keyword found in `' + t + '`\n')
    else:
        print('No keyword in `' + t + '`\n')

Output:

industry
(65, 73)
Keyword found in `Lorem Ipsum is simply dummy text of the printing and typesetting industry`

industry
(25, 33)
Keyword found in `Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book`

No keyword in `It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged`

publishing
(132, 142)
Keyword found in `It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum`

你可以更換

if (keywords in text):
   ...

if any(keyword in text for keyword in keywords):
   ...

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM