简体   繁体   English

python搜索文件中的单词列表

[英]python search file for a list of words

First I started trying to search file for one single word with this code: 首先,我开始尝试使用以下代码在文件中搜索一个单词:

import re

shakes = open("tt.txt", "r")

for line in shakes:
    if re.match("(.*)(H|h)appy(.*)", line):
        print line,

but what if I need to check for multiple words? 但是,如果我需要检查多个单词怎么办? I was thinking that maybe something like a for loop can work, searching the file each time for a different word in the list. 我在想也许像for循环之类的东西可以工作,每次在文件中搜索列表中的另一个单词。

Do you think this can be convenient? 您觉得这样方便吗?

Just join the word_list with | 只需加入| as delimiter. 作为分隔符。 (?i) case-insensitive modifier helps to do a case-insensitive match. (?i)不区分大小写的修饰符有助于进行不区分大小写的匹配。

for line in shakes:
    if re.search(r"(?i)"+'|'.join(word_lst), line):
        print line,

Example: 例:

>>> f = ['hello','foo','bar']
>>> s = '''hello
hai
Foo
Bar'''.splitlines()
>>> for line in s:
        if re.search(r"(?i)"+'|'.join(f), line):
            print(line)


hello
Foo
Bar

Without regex: 没有正则表达式:

>>> f = ['hello','foo','bar']
>>> s = '''hello
hai
Foo
Bar'''.splitlines()
>>> for line in s:
        if any(i.lower() in line.lower() for i in f):
            print(line)


hello
Foo
Bar

I think using regex here is not pythonic as regex is a bit implicit. 我认为在这里使用正则表达式不是pythonic,因为正则表达式有点隐含。 So I'd use loops if speed doesn't matter too much: 因此,如果速度不太重要,我会使用循环:

def find_word(word_list, line):
    for word in word_list:
        if word in line:
            return line

with open('/path/to/file.txt') as f:
    result = [find_word(word_list, line.lower()) for line in f.readlines()]  

Another idea is to use a set . 另一个想法是使用set

The code below assumes that all words in your file are separated by spaces and that word_list is the list of words to look for. 下面的代码假定文件中的所有单词都用空格隔开,并且word_list是要查找的单词列表。

shakes = open("tt.txt", "r")
words = set(word_list)
for line in shakes:
    if words & set(line.split()):
        print line,

If you want to do a case-insensitive search, you can convert each string to lowercase: 如果要进行不区分大小写的搜索,可以将每个字符串转换为小写:

shakes = open("tt.txt", "r")
words = set(w.lower() for w in word_list)
for line in shakes:
    if words & set(line.lower().split()):
        print line,

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM