简体   繁体   中英

Why is grep not finding string when definitely in file?

First time posting and a bit of a noob so if any problems with etiquette or formatting then do let me know.

I'm trying to use the grep function on the file (image below) to check if a word is present in a file. The word is definitely present as I've viewed the file. It's surrounded by spaces and is the last word in a line.

For some reason, grep can't find the word and the programme is returning 0. Why?

Thanks!

import os
import re

word = "aliows"
folder = '/Users/jordanfreedman/Thinkful/Projects/Spam_Filter/enron1/spam/'
email = '4201.2005-04-05.GP.spam.txt'

number = int(os.popen("grep -w -i -l " + word + " " + folder + email + " | wc -l").read())
print number

You could find out whether there is a match using the exit status:

import os
from subprocess import STDOUT, call

path = os.path.join(folder, email)
with open(os.devnull, 'wb', 0) as devnull:
   rc = call(['grep', '-w', '-l', '-i', '-F', word, path],
             stdout=devnull, stderr=STDOUT)
if rc == 0: 
    print('found')
elif rc == 1:
    print('not found')
else:
    print('error')

Or as @stevieb mentioned , you could find whether the word is in a given file in pure Python:

import re
from contextlib import closing
from mmap import ACCESS_READ, mmap

with open(path) as f, closing(mmap(f.fileno(), 0, access=ACCESS_READ)) as m:
   if re.search(br"(?i)\b%s\b" % re.escape(word), m):
       print('found')

You'll need to post a snip of the file so we can test the grep statement. Also, there's no reason to shell out:

import re

word = "aliows"
folder = '/Users/jordanfreedman/Thinkful/Projects/Spam_Filter/enron1/spam/'
email = '4201.2005-04-05.GP.spam.txt'

file = folder + email
fh = open(file, 'r')

contents = re.findall(word, fh.read())

print(len(contents))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM