First time posting and a bit of a noob so if any problems with etiquette or formatting then do let me know.
I'm trying to use the grep function on the file (image below) to check if a word is present in a file. The word is definitely present as I've viewed the file. It's surrounded by spaces and is the last word in a line.
For some reason, grep can't find the word and the programme is returning 0. Why?
Thanks!
import os
import re
word = "aliows"
folder = '/Users/jordanfreedman/Thinkful/Projects/Spam_Filter/enron1/spam/'
email = '4201.2005-04-05.GP.spam.txt'
number = int(os.popen("grep -w -i -l " + word + " " + folder + email + " | wc -l").read())
print number
You could find out whether there is a match using the exit status:
import os
from subprocess import STDOUT, call
path = os.path.join(folder, email)
with open(os.devnull, 'wb', 0) as devnull:
rc = call(['grep', '-w', '-l', '-i', '-F', word, path],
stdout=devnull, stderr=STDOUT)
if rc == 0:
print('found')
elif rc == 1:
print('not found')
else:
print('error')
Or as @stevieb mentioned , you could find whether the word is in a given file in pure Python:
import re
from contextlib import closing
from mmap import ACCESS_READ, mmap
with open(path) as f, closing(mmap(f.fileno(), 0, access=ACCESS_READ)) as m:
if re.search(br"(?i)\b%s\b" % re.escape(word), m):
print('found')
You'll need to post a snip of the file so we can test the grep
statement. Also, there's no reason to shell out:
import re
word = "aliows"
folder = '/Users/jordanfreedman/Thinkful/Projects/Spam_Filter/enron1/spam/'
email = '4201.2005-04-05.GP.spam.txt'
file = folder + email
fh = open(file, 'r')
contents = re.findall(word, fh.read())
print(len(contents))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.