简体   繁体   中英

Trying to search case insensitive keywords from a log text (.txt) file

I have a log file of a conversation. I want to search the file for certain keywords which I have assigned but the log file may contain uppercase, lowercase and title case sensitive words of the keyword I am searching.

I can pull outlines which have the keyword in lower case but can't get the uppercase or title case versions of the word. How can I solve this?

I have tried using

if (words.title() and words.lower()) in line:
     print (searchInLines[i])

but that doesn't seem to work.

keywords=['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']


with open("recognition_log.txt", "r", encoding="utf8") as f:
    searchInLines = f.readlines()
    f.close()

for words in keywords:
    for i, line in enumerate(searchInLines):
        if (words.title() and words.lower()) in line:
            print (searchInLines[i])

For example, the log file contains the following sentence:

"Manchester United played Barcelona yesterday, however, the manchester club lost"

I have "manchester" in my keywords so it will pick up the second one but not the first one.

How can I recognise both?

Thanks in Advance!

Using Regex

Ex:

import re

keywords=['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']


with open("recognition_log.txt", "r", encoding="utf8") as f:
    searchInLines = f.readlines()

#pattern = re.compile("(" + "|".join(keywords) + ")", flags=re.IGNORECASE)
pattern = re.compile("(" + "|".join(r"\b{}\b".format(i) for i in keywords) + ")", flags=re.IGNORECASE)
for line in searchInLines:
    if pattern.search(line):
        print(line)

I was not entirely sure what you were trying to do, but I assume it is filtering out messages (lines) that contains one of the words in keywords . Here is a simple way of doing it:

keywords=['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']

with open("recognition_log.txt", "r", encoding="utf8") as f:
    searchInLines = f.readlines()
    f.close()

for line in searchInLines:
    for keyword in keywords:
        if keyword in line.lower():
            print(line)

First of all, you dont need f.close() when you working with context manager.

As for solution, i recommend you to use regexp in that case

import re
keywords=['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']
# Compiling regext pattern from keyword list
pattern = re.compile('|'.join(keywords))

with open("recognition_log.txt", "r", encoding="utf8") as f:
    searchInLines = f.readlines()

for line in searchInLines:
    # if we get a match
    if re.search(pattern, line.lower()):
        print(line)

You can convert both the line and the keywords to upper or to lower case and compare them.

keywords = ['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']

with open("test.txt", "r", encoding="utf8") as f:
    searchInLines = f.readlines()
    f.close()

for words in keywords:
    for i, line in enumerate(searchInLines):
        if words.upper() in line.upper():
            print(searchInLines[i])

(1) Well, your words are in lower case, so "words.lower()" has no effect. (2) your example sentence would not be found if you wouldn't have "Manchester" AND "manchester" in it, since you are using "and" logic. (3) What you want, I believe, is: "if words in line.lower():"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM