[英]Python Integrate multiprocessing in multiple nested loops
由于我缺乏编码技能,我正在制作一个非常低效的拼字游戏单词生成器。 在这个程序中,用户输入一系列字母,程序使用蛮力找到每个有效的拼字游戏单词。 为了加快这个过程,我想实现多处理,但无法让它成功工作。 工作的非多处理代码如下
from multiprocessing import Process
usrList = input("type the letters you have ")
usrList = list(usrList.upper())
usrList.sort()
print(usrList)
storedList = []
def word2 (usrList):
print('trying to find two letter words')
for i in range(0,len(usrList)):
for j in range(0,len(usrList)):
if i != j:
if str(usrList[i])+str(usrList[j]) not in storedList and str(usrList[i])+str(usrList[j])+'\n' in dicList:
print(str(usrList[i])+str(usrList[j]))
storedList.append(str(usrList[i])+str(usrList[j]))
def word3(usrList):
print('trying to find three leter words')
if len(usrList) > 2:
for i in range(0,len(usrList)):
for j in range(0,len(usrList)):
for k in range(0,len(usrList)):
if i != j and i != k and j != k:
if str(usrList[i])+str(usrList[j])+str(usrList[k]) not in storedList and str(usrList[i])+str(usrList[j])+str(usrList[k])+'\n' in dicList :
print(str(usrList[i])+str(usrList[j])+str(usrList[k]))
storedList.append(str(usrList[i])+str(usrList[j])+str(usrList[k]))
def word4(usrList):
print('trying to find four letter words')
if len(usrList) > 3:
for i in range(0,len(usrList)):
for j in range(0,len(usrList)):
for k in range(0,len(usrList)):
for l in range(0,len(usrList)):
if i !=j and i != k and i!= l and j!= k and j!= l and k != l:
if str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l]) not in storedList and str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l])+'\n' in dicList:
print(str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l]))
storedList.append(str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l]))
def word5(usrList):
print('trying to find five letter words')
if len(usrList) > 4:
for i in range(0,len(usrList)):
for j in range(0,len(usrList)):
for k in range(0,len(usrList)):
for l in range(0,len(usrList)):
for m in range(0,len(usrList)):
if i !=j and i != k and i!= l and i != m and j!= k and j!= l and j!= m and k != l and k != m and l !=m:
if str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l])+str(usrList[m]) not in storedList and str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l])+str(usrList[m])+'\n' in dicList:
print(str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l])+str(usrList[m]))
storedList.append(str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l])+str(usrList[m]))
def word6(usrList):
print('trying to find six letter words')
if len(usrList) > 5:
for i in range(0,len(usrList)):
for j in range(0,len(usrList)):
for k in range(0,len(usrList)):
for l in range(0,len(usrList)):
for m in range(0,len(usrList)):
for n in range(0,len(usrList)):
if i !=j and i != k and i!= l and i != m and i != n and j!= k and j!= l and j!= m and j !=n and k != l and k != m and k != n and l !=m and l != n and m!= n:
if str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l])+str(usrList[m])+str(usrList[n]) not in storedList and str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l])+str(usrList[m])+str(usrList[n])+'\n' in dicList:
print(str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l])+str(usrList[m])+str(usrList[n]))
storedList.append(str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l])+str(usrList[m])+str(usrList[n]))
def word7(usrList):
print('trying to find seven letter words')
if len(usrList) > 6:
for i in range(0,len(usrList)):
for j in range(0,len(usrList)):
for k in range(0,len(usrList)):
for l in range(0,len(usrList)):
for m in range(0,len(usrList)):
for n in range(0,len(usrList)):
for o in range(0,len(usrList)):
if i !=j and i != k and i!= l and i != m and i != n and i != 0 and j!= k and j!= l and j!= m and j !=n and j != o and k != l and k != m and k != n and k!= o and l !=m and l != n and l != 0 and m!= n and m != o and n != o:
if str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l])+str(usrList[m])+str(usrList[n])+str(usrList[o]) not in storedList and str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l])+str(usrList[m])+str(usrList[n])+str(usrList[o])+'\n' in dicList :
print(str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l])+str(usrList[m])+str(usrList[n])+str(usrList[o]))
storedList.append(str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l])+str(usrList[m])+str(usrList[n])+str(usrList[o]))
f = 'ScrabbleDic.txt'
with open(f,'r') as file:
dicList=[]
for line in file:
dicList.append(line)
file.close()
if __name__ == '__main__':
word7(usrList)
word6(usrList)
word5(usrList)
word4(usrList)
word3(usrList)
word2(usrList)
一般来说,你从重新设计算法中获得的价值往往比使用多处理获得的价值更多。
这是您的代码的较短实现。 我已经对 usrList 进行了硬编码,并且由于我无权访问您正在使用的字典文件,因此我使用的是 MacOS 附带的默认字典文件。 我没有编写嵌套循环和检查重复索引,而是使用 itertools 模块生成给定长度的 usrList 的所有排列。 这不会显着加快代码速度,但可以更轻松地演示可能的更改:
import itertools
usrList = ['P', 'Y', 'T', 'H', 'O', 'N', 'S']
storedList = []
with open('/usr/share/dict/words', 'r') as dict_file:
dicList = [word.strip().upper() for word in dict_file]
def possible_words(length):
for letter_permutation in itertools.permutations(usrList, length):
word = ''.join(letter_permutation) # itertools returns a tuple, not a string
if word in dicList: # This requires a linear search through the list
storedList.append(word)
for word_length in range(2, 8): # Note that the upper bound is 7 letters, not 8
possible_words(word_length)
在我的 Macbook 上运行大约需要47.4 秒。 为了加快速度,让我们按照您的建议添加多处理。 有几种使用多处理的方法,但最容易实现的可能是创建一个池并调用它的map()
function。
如果您不习惯使用其他函数作为 arguments 的函数,此语法可能看起来有点奇怪。 实际上,我们正在创建一个工人池,然后为该池提供一个 function 和一系列 arguments 以在该 function 上使用。 然后将各个 function 调用拆分到池中,而不是按顺序调用:
import itertools
import multiprocessing
usrList = ['P', 'Y', 'T', 'H', 'O', 'N', 'S']
storedList = []
with open('/usr/share/dict/words', 'r') as dict_file:
dicList = [word.strip().upper() for word in dict_file]
def possible_words(length):
for letter_permutation in itertools.permutations(usrList, length):
word = ''.join(letter_permutation)
if word in dicList:
storedList.append(word)
if __name__ == '__main__': # multiprocessing complains if this isn't isolated
with multiprocessing.Pool(6) as p: # Creates 6 worker processes
p.map(possible_words, range(2, 8)) # Each process calls possible_words() with a different input
这在我的 Macbook 上运行32.3秒。 我们缩短了四分之一的时间,可能有一些方法可以从这种方法中挤出更多的性能。 但也值得研究一下算法,看看是否有其他方法可以加快速度。
现在,您正在创建一个字典单词列表。 当您检查潜在单词是否在该列表中时,Python 必须扫描整个列表,直到找到匹配项或到达末尾。 我的内置字典有 235K 单词,所以这意味着它必须对它生成的每个无意义的字母组合进行 235K 字符串比较!
如果从使用列表切换到集合,Python 可以改为使用 hash function 在几乎恒定的时间内查找一个值,而不是在每个条目的扫描时间。 让我们尝试一下,而不是多处理:
import itertools
usrList = ['P', 'Y', 'T', 'H', 'O', 'N', 'S']
storedList = []
with open('/usr/share/dict/words', 'r') as dict_file:
dicSet = {word.strip().upper() for word in dict_file} # By changing [] to {}, this is now a set
def possible_words(length):
for letter_permutation in itertools.permutations(usrList, length):
word = ''.join(letter_permutation)
if word in dicSet: # This now only does 1 check, not 235,000
storedList.append(word)
for word_length in range(2, 8):
possible_words(word_length)
这个版本在0.005 秒内运行,只需更改两个字符!
总之,多处理是一个有用的工具,但它可能不应该是您尝试的第一件事。 通过思考您正在使用的数据结构和算法以及瓶颈可能在哪里,您通常会获得更好的结果。
解决此类难题的经典解决方案不是检查每个可能的排列,而是将样本字母和字典中的单词转换为一致的可搜索排列 - 通过对它们的字符进行排序!
现在,您无需在字典中搜索“PYTHONS”的每个排列,只需对字母进行排序以创建键“HNOPSTY”,所有具有相同键的有效单词都将在 map 中找到。
使用 defaultdict,很容易创建字典中所有单词的查找 map。 我们使用defaultdict(list)
而不是 dict 因为多个单词可能排序到同一个键。
from collections import defaultdict
dictionary_mapping = defaultdict(list)
# assuming dictionary is a list of all valid words, regardless of length
for word in dictionary:
key = ''.join(sorted(word.upper()))
dictionary_mapping[key].append(word)
search_word = "PYTHONS"
search_key = ''.join(sorted(search_word.upper()))
# get all words that are anagrams of the search word, or the empty list if none
print(dictionary_mapping.get(search_key, []))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.