Python：如何使用關鍵字列表在文本中搜索字符串

Question

所以我正在編寫一個循環多個.txt 文件並搜索任意數量的預先指定的關鍵字的程序。 我很難找到一種方法來通過要搜索的關鍵字列表。

下面的代碼當前返回以下錯誤：

TypeError: 'in <string>' requires string as left operand, not list

我知道該錯誤是由關鍵字列表引起的，但我不知道如何在不運行此錯誤的情況下輸入大量關鍵字。

當前代碼：

from os import listdir

keywords=['Example', 'Use', 'Of', 'Keywords']
 
with open("/home/user/folder/project/result.txt", "w") as f:
    for filename in listdir("/home/user/folder/project/data"):
        with open('/home/user/folder/project/data/' + filename) as currentFile:
            text = currentFile.read()
            #Error Below
            if (keywords in text):
                f.write('Keyword found in ' + filename[:-4] + '\n')
            else:
                f.write('No keyword in ' + filename[:-4] + '\n')

錯誤在注釋部分下的上述代碼的第 10 行中指示。 我不確定為什么我不能調用列表來搜索關鍵字。 任何幫助表示贊賞，謝謝！

Answer 1

嘗試遍歷列表以查看每個元素是否在文本中

for i in range(0, len(keywords)):
    if keywords[i] in text:
        f.write('Keyword found in ' + filename[:-4] + '\n')
        break
    else:
        f.write('No keyword in ' + filename[:-4] + '\n')
        break

您也不能使用in查看列表是否在字符串中

Answer 2

我會使用正則表達式，因為它們是專門為在文本中搜索子字符串而構建的。

您只需要re.search塊。 我添加了findall和finditer的示例來揭開它們的神秘面紗。

# lets pretend these 4 sentences in `text` are 4 different files
text = '''Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum'''.split(sep='. ')

# add more keywords
keywords=[r'publishing', r'industry']
regex = '|'.join(keywords)
import re
for t in text:
    lst = re.findall(regex, t, re.I) # re.I make case-insensitive
    for el in lst:
        print(el)

    iterator = re.finditer(regex, t, re.I)
    for el in iterator:
        print(el.span())

    if re.search(regex, t, re.I):
        print('Keyword found in `' + t + '`\n')
    else:
        print('No keyword in `' + t + '`\n')

Output：

industry
(65, 73)
Keyword found in `Lorem Ipsum is simply dummy text of the printing and typesetting industry`

industry
(25, 33)
Keyword found in `Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book`

No keyword in `It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged`

publishing
(132, 142)
Keyword found in `It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum`

Answer 3

你可以更換

if (keywords in text):
   ...

和

if any(keyword in text for keyword in keywords):
   ...

Python：如何使用關鍵字列表在文本中搜索字符串

問題描述

3 個解決方案

解決方案1
0 2021-03-08 02:28:45

解決方案2
0 2021-03-08 02:48:15

解決方案3
0 已采納 2021-03-08 03:05:08

Python：如何使用關鍵字列表在文本中搜索字符串

問題描述

3 個解決方案

解決方案1 0 2021-03-08 02:28:45

解決方案2 0 2021-03-08 02:48:15

解決方案3 0 已采納 2021-03-08 03:05:08

解決方案1
0 2021-03-08 02:28:45

解決方案2
0 2021-03-08 02:48:15

解決方案3
0 已采納 2021-03-08 03:05:08