[英]Trying to search case insensitive keywords from a log text (.txt) file
I have a log file of a conversation. 我有一个对话的日志文件。 I want to search the file for certain keywords which I have assigned but the log file may contain uppercase, lowercase and title case sensitive words of the keyword I am searching. 我想在文件中搜索某些已分配的关键字,但是日志文件中可能包含我要搜索的关键字的大写,小写和标题区分大小写的单词。
I can pull outlines which have the keyword in lower case but can't get the uppercase or title case versions of the word. 我可以拉出带有关键字的小写轮廓,但不能获取单词的大写或标题大写版本。 How can I solve this? 我该如何解决?
I have tried using 我尝试使用
if (words.title() and words.lower()) in line:
print (searchInLines[i])
but that doesn't seem to work. 但这似乎不起作用。
keywords=['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']
with open("recognition_log.txt", "r", encoding="utf8") as f:
searchInLines = f.readlines()
f.close()
for words in keywords:
for i, line in enumerate(searchInLines):
if (words.title() and words.lower()) in line:
print (searchInLines[i])
For example, the log file contains the following sentence: 例如,日志文件包含以下句子:
"Manchester United played Barcelona yesterday, however, the manchester club lost" “曼联昨天打了巴塞罗那,但是曼联输了”
I have "manchester" in my keywords so it will pick up the second one but not the first one. 我的关键字中包含“ manchester”,因此它将选择第二个而不是第一个。
How can I recognise both? 我怎么能认出两者?
Thanks in Advance! 提前致谢!
Using Regex 使用正则表达式
Ex: 例如:
import re
keywords=['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']
with open("recognition_log.txt", "r", encoding="utf8") as f:
searchInLines = f.readlines()
#pattern = re.compile("(" + "|".join(keywords) + ")", flags=re.IGNORECASE)
pattern = re.compile("(" + "|".join(r"\b{}\b".format(i) for i in keywords) + ")", flags=re.IGNORECASE)
for line in searchInLines:
if pattern.search(line):
print(line)
I was not entirely sure what you were trying to do, but I assume it is filtering out messages (lines) that contains one of the words in keywords
. 我不确定您要做什么,但我认为它正在过滤掉包含keywords
中的单词之一的消息(行)。 Here is a simple way of doing it: 这是一种简单的方法:
keywords=['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']
with open("recognition_log.txt", "r", encoding="utf8") as f:
searchInLines = f.readlines()
f.close()
for line in searchInLines:
for keyword in keywords:
if keyword in line.lower():
print(line)
First of all, you dont need f.close() when you working with context manager. 首先,使用上下文管理器时不需要f.close()。
As for solution, i recommend you to use regexp in that case 至于解决方案,我建议您在这种情况下使用regexp
import re
keywords=['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']
# Compiling regext pattern from keyword list
pattern = re.compile('|'.join(keywords))
with open("recognition_log.txt", "r", encoding="utf8") as f:
searchInLines = f.readlines()
for line in searchInLines:
# if we get a match
if re.search(pattern, line.lower()):
print(line)
You can convert both the line and the keywords to upper or to lower case and compare them. 您可以将行和关键字都转换为大写或小写并进行比较。
keywords = ['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']
with open("test.txt", "r", encoding="utf8") as f:
searchInLines = f.readlines()
f.close()
for words in keywords:
for i, line in enumerate(searchInLines):
if words.upper() in line.upper():
print(searchInLines[i])
(1) Well, your words are in lower case, so "words.lower()" has no effect. (1)好吧,您的单词是小写字母,因此“ words.lower()”无效。 (2) your example sentence would not be found if you wouldn't have "Manchester" AND "manchester" in it, since you are using "and" logic. (2)如果您没有“ Manchester”和“ manchester”,则不会找到您的例句,因为您使用的是“ and”逻辑。 (3) What you want, I believe, is: "if words in line.lower():" (3)我相信您想要的是:“如果line.lower()中的单词:”
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.