简体   繁体   中英

How to check to see if the contents in file A exist in the contents files in a directory

I have a file that has several lines of text, lets say:

cat
dog
rabbit

I would like to to traverse a directory to check to see if any text files contain the items in the aforementioned list.

I have tried many of things many different ways. I did not want to post anything because I wanted a fresh start...Fresh line of thinking. I worked the below code to the point that I don't even understand whats going on and Im completely lost. :(

#! /usr/bin/python

'''
The purpose of this program
is to search the OS file system
in order to find a txt file that contain the nagios host entries
'''

import os

host_list = open('/path/path/list', 'r')

host = host_list.read()
##for host in host_remove.read():

host_list.close()
#print host

for root, dirs, files in os.walk("/path/path/somefolder/"):
    for file in files:
        if file.endswith(".txt"):

            check_file = os.path.join(root, file)
            #print check_file


            if host.find(check_file): #in check_file:

                print host.find(check_file)                    
                #print host+" is found in "+check_file
                #print os.path.join(root, file)
            else:
                break

Python is way, way overkill for this task. Just use grep :

$ grep -wFf list_of_needles.txt some_target.txt

If you really need to use Python, wrap a grep call in subprocess or similar.

An analog of the shell command :

$ find /path/somefolder/ -name \*.txt -type f -exec grep -wFf /path/list {} +

in Python:

#!/usr/bin/env python
import os
import re
import sys

def files_with_matched_lines(topdir, matched):
    for root, dirs, files in os.walk(topdir, topdown=True):
        dirs[:] = [d for d in dirs if not d.startswith('.')] # skip "hidden" dirs
        for filename in files:
            if filename.endswith(".txt"):
                path = os.path.join(root, filename)
                try:
                    with open(path) as file:
                        for line in file:
                            if matched(line):
                                yield path
                                break
                except EnvironmentError as e:
                    print >>sys.stderr, e

with open('/path/list') as file:
    hosts = file.read().splitlines()
matched = re.compile(r"\b(?:%s)\b" % "|".join(map(re.escape, hosts))).search
for path in files_with_matched_lines("/path/somefolder/", matched):
    print path

I've made some minor changes to the algorytms provided by JF Sebastian. The changes will ask for user input. It will also run on windows with no issues.

#!/usr/bin/env python
import os
import re
import sys

contents = raw_input("Please provide the full path and file name that contains the items you would like to search for \n")
print "\n"
print "\n"
direct = raw_input("Please provide the directory you would like to search. \
Use C:/, if you want to search the root directory on a windows machine\n")

def files_with_matched_lines(topdir, matched):
    for root, dirs, files in os.walk(topdir, topdown=True):
        dirs[:] = [d for d in dirs if not d.startswith('.')] # skip "hidden" dirs
        for filename in files:
            if filename.endswith(".txt"):
                path = os.path.join(root, filename)
                try:
                    with open(path) as file:
                        for line in file:
                            if matched(line):
                                yield path
                                break
                except EnvironmentError as e:
                    print >>sys.stderr, e

with open(contents) as file:
    hosts = file.read().splitlines()
matched = re.compile(r"\b(?:%s)\b" % "|".join(map(re.escape, hosts))).search
for path in files_with_matched_lines(direct, matched):
    print path

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM