I have a file that has several lines of text, lets say:
cat
dog
rabbit
I would like to to traverse a directory to check to see if any text files contain the items in the aforementioned list.
I have tried many of things many different ways. I did not want to post anything because I wanted a fresh start...Fresh line of thinking. I worked the below code to the point that I don't even understand whats going on and Im completely lost. :(
#! /usr/bin/python
'''
The purpose of this program
is to search the OS file system
in order to find a txt file that contain the nagios host entries
'''
import os
host_list = open('/path/path/list', 'r')
host = host_list.read()
##for host in host_remove.read():
host_list.close()
#print host
for root, dirs, files in os.walk("/path/path/somefolder/"):
for file in files:
if file.endswith(".txt"):
check_file = os.path.join(root, file)
#print check_file
if host.find(check_file): #in check_file:
print host.find(check_file)
#print host+" is found in "+check_file
#print os.path.join(root, file)
else:
break
Python is way, way overkill for this task. Just use grep
:
$ grep -wFf list_of_needles.txt some_target.txt
If you really need to use Python, wrap a grep
call in subprocess
or similar.
An analog of the shell command :
$ find /path/somefolder/ -name \*.txt -type f -exec grep -wFf /path/list {} +
in Python:
#!/usr/bin/env python
import os
import re
import sys
def files_with_matched_lines(topdir, matched):
for root, dirs, files in os.walk(topdir, topdown=True):
dirs[:] = [d for d in dirs if not d.startswith('.')] # skip "hidden" dirs
for filename in files:
if filename.endswith(".txt"):
path = os.path.join(root, filename)
try:
with open(path) as file:
for line in file:
if matched(line):
yield path
break
except EnvironmentError as e:
print >>sys.stderr, e
with open('/path/list') as file:
hosts = file.read().splitlines()
matched = re.compile(r"\b(?:%s)\b" % "|".join(map(re.escape, hosts))).search
for path in files_with_matched_lines("/path/somefolder/", matched):
print path
I've made some minor changes to the algorytms provided by JF Sebastian. The changes will ask for user input. It will also run on windows with no issues.
#!/usr/bin/env python
import os
import re
import sys
contents = raw_input("Please provide the full path and file name that contains the items you would like to search for \n")
print "\n"
print "\n"
direct = raw_input("Please provide the directory you would like to search. \
Use C:/, if you want to search the root directory on a windows machine\n")
def files_with_matched_lines(topdir, matched):
for root, dirs, files in os.walk(topdir, topdown=True):
dirs[:] = [d for d in dirs if not d.startswith('.')] # skip "hidden" dirs
for filename in files:
if filename.endswith(".txt"):
path = os.path.join(root, filename)
try:
with open(path) as file:
for line in file:
if matched(line):
yield path
break
except EnvironmentError as e:
print >>sys.stderr, e
with open(contents) as file:
hosts = file.read().splitlines()
matched = re.compile(r"\b(?:%s)\b" % "|".join(map(re.escape, hosts))).search
for path in files_with_matched_lines(direct, matched):
print path
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.