I need to search a few thousand individual text files within a directory to see how many contain a string. I need to use python. Right now I have the following basic code working for one individual file. I can't figure out the next step: how to loop through the content of each of the individual files in the directory. Here is what I have:
stringtofind = 'FULL TEXT' #enter something between the ''s
filetolookin = '2013-04-061.txt' #enter the file you want to search
def countif(isthis, infile):
count = 0
if isthis in open(infile).read():
count = 1+count
return count
else:
return count
print countif(stringtofind, filetolookin)
Thanks for your help.
os.walk
will allow you to recursively enumerate files in a directory. Once you have the file names use functions in os.path
to get the file name and extension if you need to filter on those. For the file content, the re
module will let you use regular expressions to search for a pattern line by line.
The best is to use glob module, provided all the files you want to search or in one folder/directory.
import glob
icount = 0
stringtofind = 'FULL TEXT'
filetolookin = '2013*.txt'
g = glob.glob(filetolookin)
for f in g:
icount = 0
for j in open(f):
j.find(stringtofind) >=0:
print j
icount = icount + 1
# or whatever you want
print "File: ", f, "count ", icount
This sounds like it's perfectly suited to using the fileinput module in the standard libraries:
#!/usr/bin/env python
usage = 'Call this with a search string and a list of files to search'
if __name__ == '__main__':
import sys, fileinput
if len(sys.argv) < 3:
print usage
sys.exit()
search_string = sys.argv[1]
count = 0
for line in fileinput.input(sys.argv[2:]):
if search_string in line:
count += 1
print count
This is complete working script for this question using python 2.7.x
import sys
import os
import re
def search_count(str,loc):
count = 0
os.chdir(loc)
for (thisDir, subsHere, filesHere) in os.walk('.'):
for filename in filesHere:
with open(filename,"r") as f:
content = f.read()
if re.search(str,content):
count += 1
return count
if __name__ == "__main__":
stringtofind = raw_input('Enter text to search: ')
pathtolookin = raw_input('Enter path to search: ')
if sys.platform[:3] == 'win':
pathtolookin = pathtolookin.replace('\\','/')
print search_count(stringtofind,pathtolookin)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.