简体   繁体   中英

Fastest way in Python to search if a text file contains any of the word from a list

Using python, I want to check if a text file contains any of the word from a list?

One way I can think for doing this is:

file_data = []
search_words = ['one', 'two', 'three']
with open(filePath, 'r') as f:
        file_data = f.read()
for line in file_data:
    for single_word in search_words:
        if single_word in line.split()
            print("Found {0} in {1}".format(single_word, line))

But, is there a better way to do the same?

Just use grep :

import subprocess

def search_file(filename):
    words = ['one', 'two', 'three']
    command = f'grep -n {filename} -e ' + ' -e '.join(words)

    # command = 'grep -n {filename} -e one -e two -e three'

    return subprocess.check_output(command, shell=True).decode()

The -n flag tells grep to pipe the line number of its matches, and the -e flag tells it which patterns to look for.

You can even scan an entire directory using the -r flag:

import subprocess

def search_dir(directory):
    words = ['one', 'two', 'three']
    command = f'grep -n -r {directory} -e ' + ' -e '.join(words)
    return subprocess.check_output(command, shell=True).decode()

This only works on unix environments. If you are using Windows, you'll need to use findstr instead.

You used regex tag so here is a regex way of searching. (Supposing that loading the file into a string is allowed)

import re

search_words = ["wordA", "wordB"]
pattern = ""
for word in search_words:
    _ = "(?=.*" + word + ")"
    pattern += _
txt = "Neque porro wordA quisquam est qui wordB dolorem ipsum quia dolor"

x = re.search(pattern, txt)
if x:
  print("YES! We have a match!")
else:
  print("No match")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM