简体   繁体   English

尝试从日志文本(.txt)文件中搜索不区分大小写的关键字

[英]Trying to search case insensitive keywords from a log text (.txt) file

I have a log file of a conversation. 我有一个对话的日志文件。 I want to search the file for certain keywords which I have assigned but the log file may contain uppercase, lowercase and title case sensitive words of the keyword I am searching. 我想在文件中搜索某些已分配的关键字,但是日志文件中可能包含我要搜索的关键字的大写,小写和标题区分大小写的单词。

I can pull outlines which have the keyword in lower case but can't get the uppercase or title case versions of the word. 我可以拉出带有关键字的小写轮廓,但不能获取单词的大写或标题大写版本。 How can I solve this? 我该如何解决?

I have tried using 我尝试使用

if (words.title() and words.lower()) in line:
     print (searchInLines[i])

but that doesn't seem to work. 但这似乎不起作用。

keywords=['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']


with open("recognition_log.txt", "r", encoding="utf8") as f:
    searchInLines = f.readlines()
    f.close()

for words in keywords:
    for i, line in enumerate(searchInLines):
        if (words.title() and words.lower()) in line:
            print (searchInLines[i])

For example, the log file contains the following sentence: 例如,日志文件包含以下句子:

"Manchester United played Barcelona yesterday, however, the manchester club lost" “曼联昨天打了巴塞罗那,但是曼联输了”

I have "manchester" in my keywords so it will pick up the second one but not the first one. 我的关键字中包含“ manchester”,因此它将选择第二个而不是第一个。

How can I recognise both? 我怎么能认出两者?

Thanks in Advance! 提前致谢!

Using Regex 使用正则表达式

Ex: 例如:

import re

keywords=['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']


with open("recognition_log.txt", "r", encoding="utf8") as f:
    searchInLines = f.readlines()

#pattern = re.compile("(" + "|".join(keywords) + ")", flags=re.IGNORECASE)
pattern = re.compile("(" + "|".join(r"\b{}\b".format(i) for i in keywords) + ")", flags=re.IGNORECASE)
for line in searchInLines:
    if pattern.search(line):
        print(line)

I was not entirely sure what you were trying to do, but I assume it is filtering out messages (lines) that contains one of the words in keywords . 我不确定您要做什么,但我认为它正在过滤掉包含keywords中的单词之一的消息(行)。 Here is a simple way of doing it: 这是一种简单的方法:

keywords=['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']

with open("recognition_log.txt", "r", encoding="utf8") as f:
    searchInLines = f.readlines()
    f.close()

for line in searchInLines:
    for keyword in keywords:
        if keyword in line.lower():
            print(line)

First of all, you dont need f.close() when you working with context manager. 首先,使用上下文管理器时不需要f.close()。

As for solution, i recommend you to use regexp in that case 至于解决方案,我建议您在这种情况下使用regexp

import re
keywords=['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']
# Compiling regext pattern from keyword list
pattern = re.compile('|'.join(keywords))

with open("recognition_log.txt", "r", encoding="utf8") as f:
    searchInLines = f.readlines()

for line in searchInLines:
    # if we get a match
    if re.search(pattern, line.lower()):
        print(line)

You can convert both the line and the keywords to upper or to lower case and compare them. 您可以将行和关键字都转换为大写或小写并进行比较。

keywords = ['bimbo', 'qualified', 'tornadoes', 'alteryx', 'excel', 'manchester']

with open("test.txt", "r", encoding="utf8") as f:
    searchInLines = f.readlines()
    f.close()

for words in keywords:
    for i, line in enumerate(searchInLines):
        if words.upper() in line.upper():
            print(searchInLines[i])

(1) Well, your words are in lower case, so "words.lower()" has no effect. (1)好吧,您的单词是小写字母,因此“ words.lower()”无效。 (2) your example sentence would not be found if you wouldn't have "Manchester" AND "manchester" in it, since you are using "and" logic. (2)如果您没有“ Manchester”和“ manchester”,则不会找到您的例句,因为您使用的是“ and”逻辑。 (3) What you want, I believe, is: "if words in line.lower():" (3)我相信您想要的是:“如果line.lower()中的单词:”

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 尝试将关键字从.txt文件保存到数组,然后使用该数组在另一个文档中搜索关键字 - Trying to save keywords from a .txt file to an array and use that array to search another doc for the keywords 如何使 txt 文件的搜索输入不区分大小写 - How to make search input case insensitive for txt file pandas DF 有 1 列或更多列搜索关键字,关键字在第二个 DF 中搜索; 不区分大小写 - pandas DF with 1 or more columns to search for keywords, keywords to search for in a second DF; case insensitive Python:在从电子书转换而来的txt文件中搜索关键字,然后替换关键字。 - Python: Search for keywords in a txt file that was converted from ebook and replace keywords. 从数据框的列中过滤掉关键字(不区分大小写) - Pandas - Filtering out keywords(case insensitive) from a column of a dataframe - Pandas 如何使用Python在文本文件中使用正则表达式搜索不区分大小写的字符串和(R) - How to search a string case insensitive and (R) using regular expression in a text file using Python 在 Python 中使用 Selenium 搜索不区分大小写的文本的最佳方法是什么? - What is the best way to search case insensitive text using Selenium in Python? 尝试将搜索结果从txt文件返回到GUI - Trying to return search results from a txt file to GUI 从日志文件中提取特定单词(不是关键字) - extracting specific words (not keywords) from a log file 附魔不区分大小写的字典搜索 - Case insensitive dictionary search with enchant
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM