简体   繁体   中英

python search file for a list of words

First I started trying to search file for one single word with this code:

import re

shakes = open("tt.txt", "r")

for line in shakes:
    if re.match("(.*)(H|h)appy(.*)", line):
        print line,

but what if I need to check for multiple words? I was thinking that maybe something like a for loop can work, searching the file each time for a different word in the list.

Do you think this can be convenient?

Just join the word_list with | as delimiter. (?i) case-insensitive modifier helps to do a case-insensitive match.

for line in shakes:
    if re.search(r"(?i)"+'|'.join(word_lst), line):
        print line,

Example:

>>> f = ['hello','foo','bar']
>>> s = '''hello
hai
Foo
Bar'''.splitlines()
>>> for line in s:
        if re.search(r"(?i)"+'|'.join(f), line):
            print(line)


hello
Foo
Bar

Without regex:

>>> f = ['hello','foo','bar']
>>> s = '''hello
hai
Foo
Bar'''.splitlines()
>>> for line in s:
        if any(i.lower() in line.lower() for i in f):
            print(line)


hello
Foo
Bar

I think using regex here is not pythonic as regex is a bit implicit. So I'd use loops if speed doesn't matter too much:

def find_word(word_list, line):
    for word in word_list:
        if word in line:
            return line

with open('/path/to/file.txt') as f:
    result = [find_word(word_list, line.lower()) for line in f.readlines()]  

Another idea is to use a set .

The code below assumes that all words in your file are separated by spaces and that word_list is the list of words to look for.

shakes = open("tt.txt", "r")
words = set(word_list)
for line in shakes:
    if words & set(line.split()):
        print line,

If you want to do a case-insensitive search, you can convert each string to lowercase:

shakes = open("tt.txt", "r")
words = set(w.lower() for w in word_list)
for line in shakes:
    if words & set(line.lower().split()):
        print line,

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM