Python: How to use list of keywords to search for a string in a text

Question

So I'm writing a program that loops through multiple.txt files and searches for any number of pre-specified keywords. I'm having some trouble finding a way to pass through the keywords list to be searched for.

The code below currently returns the following error:

TypeError: 'in <string>' requires string as left operand, not list

I'm aware that the error is caused by the keyword list but I have no idea how to input a large array of keywords without it running this error.

Current code:

from os import listdir

keywords=['Example', 'Use', 'Of', 'Keywords']
 
with open("/home/user/folder/project/result.txt", "w") as f:
    for filename in listdir("/home/user/folder/project/data"):
        with open('/home/user/folder/project/data/' + filename) as currentFile:
            text = currentFile.read()
            #Error Below
            if (keywords in text):
                f.write('Keyword found in ' + filename[:-4] + '\n')
            else:
                f.write('No keyword in ' + filename[:-4] + '\n')

The error is indicated in line 10 in the above code under the commented section. I'm unsure as to why I can't call a list to be able to search for the keywords. Any help is appreciated, thanks!

Answer 1

try looping through the list to see if each element is in the text

for i in range(0, len(keywords)):
    if keywords[i] in text:
        f.write('Keyword found in ' + filename[:-4] + '\n')
        break
    else:
        f.write('No keyword in ' + filename[:-4] + '\n')
        break

you cannot use in too see if a list is in a string

Answer 2

I would use regular expressions as they are purpose-built for searching text for substrings.

You only need the re.search block. I added examples of findall and finditer to demystify them.

# lets pretend these 4 sentences in `text` are 4 different files
text = '''Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum'''.split(sep='. ')

# add more keywords
keywords=[r'publishing', r'industry']
regex = '|'.join(keywords)
import re
for t in text:
    lst = re.findall(regex, t, re.I) # re.I make case-insensitive
    for el in lst:
        print(el)

    iterator = re.finditer(regex, t, re.I)
    for el in iterator:
        print(el.span())

    if re.search(regex, t, re.I):
        print('Keyword found in `' + t + '`\n')
    else:
        print('No keyword in `' + t + '`\n')

Output:

industry
(65, 73)
Keyword found in `Lorem Ipsum is simply dummy text of the printing and typesetting industry`

industry
(25, 33)
Keyword found in `Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book`

No keyword in `It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged`

publishing
(132, 142)
Keyword found in `It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum`

Answer 3

You could replace

if (keywords in text):
   ...

with

if any(keyword in text for keyword in keywords):
   ...

Python: How to use list of keywords to search for a string in a text

Question

3 answers

solution1
0 2021-03-08 02:28:45

solution2
0 2021-03-08 02:48:15

solution3
0 ACCPTED 2021-03-08 03:05:08

Python: How to use list of keywords to search for a string in a text

Question

3 answers

solution1 0 2021-03-08 02:28:45

solution2 0 2021-03-08 02:48:15

solution3 0 ACCPTED 2021-03-08 03:05:08

solution1
0 2021-03-08 02:28:45

solution2
0 2021-03-08 02:48:15

solution3
0 ACCPTED 2021-03-08 03:05:08