简体   繁体   English

Python没有输出将字符串与导入的大型字典文件进行比较

[英]Python no output comparing string with imported large dictionary file

I'm trying to write code to help me at crossword puzzle. 我正在尝试编写代码以帮助我解决填字游戏。 I'm experiencing the following errors. 我遇到以下错误。

1.When I try to use the much larger text file with my word list I receive no output only the small 3 string word list works. 1.当我尝试将更大的文本文件与单词列表一起使用时,仅小3字符串单词列表有效,但我没有收到任何输出。

2.The match test positive for the first two strings of my test word list. 2.我的测试单词列表的前两个字符串的匹配测试结果为阳性。 I need it to only test true for the entire words in my word list. 我只需要对单词列表中的整个单词进行测试即可。 [ SOLVED SOLUTION in the code bellow ] [下面的代码中已解决的问题]

lex.txt contains lex.txt包含

dad

add

test 测试

I call the code using the following. 我使用以下代码调用代码。
./cross.py dad ./cross.py爸爸

[ SOLVED SOLUTION ] This is really slow. [解决的方法]这确实很慢。

#!/usr/bin/env python

import itertools, sys, re

sys.dont_write_bytecode = True
original_string=str(sys.argv[1])
lenth_of_string=len(original_string)
string_to_tuple=tuple(original_string)


with open('wordsEn.txt', 'r') as inF:
    for line in inF:
        for a in set (itertools.permutations(string_to_tuple, lenth_of_string)):
            joined_characters="".join(a)
            if re.search('\\b'+joined_characters+'\\b',line):
                print joined_characters

Let's take a look at your code. 让我们看一下您的代码。 You take the input string, you create all possible permutations of it and then you look for these permutations in the dictionary. 您获取输入字符串,创建它的所有可能排列,然后在字典中查找这些排列。

The most significant speed impact from my point of view is that you create the permutations of the word over and over again, for every word in your dictionary. 从我的角度来看,最显着的速度影响是您为词典中的每个单词反复创建单词的排列。 This is very time consuming. 这非常耗时。

Besides of that, you don't even need the permutations. 除此之外,您甚至不需要排列。 It's obvious that two words can be "converted" to each other by permuting if they've got the same letters. 显然,如果两个单词具有相同的字母,则可以通过置换来相互“转换”。 So your piece of code can be reimplemented as follows : 因此,您的代码片段可以如下重新实现:

import itertools, sys, re
import time
from collections import Counter


sys.dont_write_bytecode = True
original_string=str(sys.argv[1]).strip()
lenth_of_string=len(original_string)
string_to_tuple=tuple(original_string)

def original_impl():
    to_return = []
    with open('wordsEn.txt', 'r') as inF:
        for line in inF:
            for a in set (itertools.permutations(string_to_tuple, lenth_of_string)):
                joined_characters="".join(a)
                if re.search('\\b'+joined_characters+'\\b',line):
                    to_return.append(joined_characters)
    return to_return

def new_impl():
    to_return = []
    stable_counter = Counter(original_string)
    with open('wordsEn.txt', 'r') as inF:
        for line in inF:
            l = line.strip()
            c = Counter(l)
            if c == stable_counter:
                to_return.append(l)
    return to_return

t1 = time.time()
result1 = original_impl()
t2 = time.time()
result2 = new_impl()
t3 = time.time()

assert result1 == result2

print "Original impl took ", t2 - t1, ", new impl took ", t3 - t2, "i.e. new impl is ", (t2-t1) / (t3 - t2), " faster"

For a dictionary with 100 words of 8 letters, the output is : 对于包含100个单词的8个字母的字典,输出为:

Original impl took  42.1336319447 , new impl took  0.000784158706665 i.e. new impl is  53731.0006081  faster

The time consumed by the original implementation for 10000 records in the dictionary is unbearable. 原始实现消耗的字典中10000条记录所花费的时间是无法忍受的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM