简体   繁体   中英

With a dictionary, search for strings in another text file and print the entire line

I want to search from a dictionary if one of its words is in a second txt file. I have problem with the following code:

print 'Searching for known strings...\n'
with open('something.txt') as f:
    haystack = f.read()
with open('d:\\Users\\something\\Desktop\\something\\dictionary\\entirelist.txt') as f:
    for needle in (line.strip() for line in f):
        if needle in haystack:
            print line

The with open statements are not from me, I took them from: Search for strings listed in one file from another text file? I want to print the line so I wrote line instead of needle. Problems comes : it says line is not defined .

My final objective is to see if any words from a dictionary is in "something.txt", and if yes, print the line where the word was identified.

It looks like you've used a generator: (line.strip() for line in f), I don't think you can access the inner variables 'line' from outside the generator scope, ie, outside the brackets.

Try something like:

for line in f:
    if line.strip() in haystack:
        print line

The specific exception you asked about is because line doesn't exist outside the generator expression. If you want to access it, you need to keep it in the same scope as the print statement, like this:

for line in f:
    needle = line.strip()
    if needle in haystack:
        print line

But this isn't going to be particularly useful. It's just going to be the word from needle plus the newline at the end. If you want to print out the line (or lines?) from haystack that include needle , you have to search for that line, not just ask whether needle appears anywhere in the whole haystack .

To literally do what you're asking for, you're going to need to loop over the lines of haystack and check each one for needle . Like this:

with open('something.txt') as f:
    haystacks = list(f)

with open('d:\\Users\\something\\Desktop\\something\\dictionary\\entirelist.txt') as f:
    for line in f:
        needle = line.strip()
        for haystack in haystacks:
            if needle in haystack:
                print haystack

However, there's a neat trick you may want to consider: If you can write a regular expression that matches any complete line that includes needle , then you just need to print out all the matches. Like this:

with open('something.txt') as f:
    haystack = f.read()
with open('d:\\Users\\something\\Desktop\\something\\dictionary\\entirelist.txt') as f:
    for line in f:
        needle = line.strip()
        pattern = '^.*{}.*$'.format(re.escape(needle))
        for match in re.finditer(pattern, haystack, re.MULTILINE):
            print match.group(0)

Here's an example of how the regular expression works:

^.*Falco.*$

正则表达式可视化

Debuggex Demo

Of course if you want to search case-insensitively, or only search for complete words, etc., you'll need to make some minor changes; see the Regular Expression HOWTO , or a third-party tutorial, for more.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM