简体   繁体   中英

searching the strings in a text file in another string

I'm taking a set of characters and generating the permutations of them, yielding a set of strings. I'm then checking to see if any of those strings in a particular text file exists in the resulting permutation. ie if one of the permutations yield something like gtryop and the word 'try' is in the text file, i would be able to know and also print it. However, my code doesn't seem to be working:

def permutations(items):

        n=len(items)
        if n==0:
            yield []
        else:
            for i in range(len(items)):
                for cc in permutations(items[:i]+items[i+1:]):
                    yield [items[i]] + cc


g = open("TextFIle.txt", "r")

x=raw_input('Input the letters: ')

for p in permutations(list(x)):

        q=''.join(p)
        for i in g:
            if i in q:
                print i

g.close()

The issue is that you read/consume the entire file while dealing with the first permutation. A brute force way of dealing with it would be to reset the internal pointer on the file each time you loop.

for p in permutations(list(x)):
    q = ''.join(p)
    for i in g:
        if i in q:
            print i
    g.seek(0)  # resetting the pointer back to the beginning of the file

Depending on what you're dealing with this may be ok, but you're going to end up with exponential looping going on (so it might be very slow).

EDIT I've also just noticed that your description doesn't quite match the code. You say "see if any of those strings exist in a particular text file" but then the code says i in q , ie check if any of the lines of the file is a subset of any of the permutations (which is basically the other way around).

Can you clarify what sort of permutations you have and what sort of lines of text you have, as, at the moment if you have a blank line in the file it would match on any input (for example).

It sounds like you want something more like the following:

raw_letters = raw_input('Input the letters: ')

# read out the entire contents of the file to search over
with open('TextFile.txt', 'r') as g:
    full_file = g.read()

# print each permutation that occurs somewhere in the file
for p in permutations(raw_letters):
    p_as_string = ''.join(p)
    if p_as_string in full_file:
        print(p_as_string)

EDIT 2 So I believe you need the behaviour of a scrabble solver. You have a dictionary of words in a file and people need to be able to input their tiles to find possible words.

I'm sure there are way better algorithms for this, but I'll have a stab at something relatively simple. The permutations idea is solid (and natural, given the problem), but it's really brute-froce and inefficient.

The key insight is that since you're iterating over all the permutations there might be an ordering you can use instead. In this case, you can reorder the word in alphabetical order (so try becomes rty ) - let's call it the word_signature . You can also order your tiles in the same way. Then for each word you can scan to see if the tiles can make it up.

# grab our word list as (original_word, word_signature) tuples
# you only have to do this once
words = []
with open('TextFIle.txt', 'r') as f:
    for word in f:
        word = word.strip()
        words.append((word, sorted(word)))


raw_letters = raw_input('Input the letters: ')
search_signature = sorted(raw_letters)

for word, word_signature in words:
    # these are the tiles we're searching over
    # pull it out like this so we can easily pop our way through
    remaining_search = list(reversed(search_signature))

    could_still_match = True
    found = []

    # this is going to look a bit horrible because you're sort of tracking
    # 2 cursors at the same time, incrementing them as you find matches, or not
    for letter in word_signature:

        while remaining_search:
            next_possible = remaining_search.pop()

            # either the possible letter is:
            # - the one we're looking for, that's great, move on to the next letter
            # - less than the one we're looking for, so we can skip it and look at
            #   another letter (by going through the next iteration of the while loop)
            # - greater than the one we're looking for, so this word is a non-match

            if next_possible == letter:
                found.append(next_possible)
                break

            if next_possible < letter:
                continue

            if next_possible > letter:
                could_still_match = False
                break

        # horrible little hack so we can break out of both loops
        if not could_still_match:
            break

    # if we found all the letters in the word then we have a match
    if len(found) == len(word_signature):
        print(word)

Read the file contents before searching it.

...
x=raw_input('Input the letters: ')
with open("TextFIle.txt", "r") as g:
    g = g.read()

for p in permutations(list(x)):
        q=''.join(p)
        if q in g:
            print q

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM