简体   繁体   English

Python 在多个嵌套循环中集成多处理

[英]Python Integrate multiprocessing in multiple nested loops

I am making a scrabble word generator that is incredibly inefficient due to my lack of coding skill.由于我缺乏编码技能,我正在制作一个非常低效的拼字游戏单词生成器。 In this program the user enters a series of letters and the program uses brute force to find every valid scrabble word.在这个程序中,用户输入一系列字母,程序使用蛮力找到每个有效的拼字游戏单词。 In order to speed this process up I want to implement multiprocessing but am unable to get it to work successfully.为了加快这个过程,我想实现多处理,但无法让它成功工作。 The working non multiprocessing code is bellow工作的非多处理代码如下

from multiprocessing import Process
usrList = input("type the letters you have     ")
usrList = list(usrList.upper())
usrList.sort()
print(usrList)    


storedList = []

def word2 (usrList):
    print('trying to find two letter words')
    for i in range(0,len(usrList)):
        for j in range(0,len(usrList)):
            if i != j:
                if str(usrList[i])+str(usrList[j]) not in storedList and str(usrList[i])+str(usrList[j])+'\n' in dicList:
                    print(str(usrList[i])+str(usrList[j]))
                    storedList.append(str(usrList[i])+str(usrList[j]))

def word3(usrList):
    print('trying to find three leter words')
    if len(usrList) > 2:
        for i in range(0,len(usrList)):
            for j in range(0,len(usrList)):
                for k in range(0,len(usrList)):
                    if i != j and i != k and j != k:
                        if  str(usrList[i])+str(usrList[j])+str(usrList[k]) not in storedList and str(usrList[i])+str(usrList[j])+str(usrList[k])+'\n' in dicList :
                            print(str(usrList[i])+str(usrList[j])+str(usrList[k]))
                            storedList.append(str(usrList[i])+str(usrList[j])+str(usrList[k]))

def word4(usrList):
    print('trying to find four letter words')
    if len(usrList) > 3:
        for i in range(0,len(usrList)):
            for j in range(0,len(usrList)):
                for k in range(0,len(usrList)):
                    for l in range(0,len(usrList)):
                        if i !=j and i != k and i!= l and j!= k and j!= l and k != l:
                            if str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l]) not in storedList and str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l])+'\n' in dicList: 
                                print(str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l]))
                                storedList.append(str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l]))


def word5(usrList):
    print('trying to find five letter words')
    if len(usrList) > 4:
        for i in range(0,len(usrList)):
            for j in range(0,len(usrList)):
                for k in range(0,len(usrList)):
                    for l in range(0,len(usrList)):
                        for m in range(0,len(usrList)):
                            if i !=j and i != k and i!= l and i != m and j!= k and j!= l and j!= m and k != l and k != m and l !=m:
                                if str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l])+str(usrList[m]) not in storedList and str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l])+str(usrList[m])+'\n' in dicList:
                                    print(str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l])+str(usrList[m]))
                                    storedList.append(str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l])+str(usrList[m]))


def word6(usrList):
    print('trying to find six letter words')
    if len(usrList) > 5:
        for i in range(0,len(usrList)):
            for j in range(0,len(usrList)):
                for k in range(0,len(usrList)):
                    for l in range(0,len(usrList)):
                        for m in range(0,len(usrList)):
                            for n in range(0,len(usrList)):
                                if i !=j and i != k and i!= l and i != m and i != n and j!= k and j!= l and j!= m and j !=n and k != l and k != m and k != n and l !=m and l != n and m!= n:
                                    if str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l])+str(usrList[m])+str(usrList[n]) not in storedList and str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l])+str(usrList[m])+str(usrList[n])+'\n' in dicList:
                                        print(str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l])+str(usrList[m])+str(usrList[n]))
                                        storedList.append(str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l])+str(usrList[m])+str(usrList[n]))

def word7(usrList):
    print('trying to find seven letter words')
    if len(usrList) > 6:
        for i in range(0,len(usrList)):
            for j in range(0,len(usrList)):
                for k in range(0,len(usrList)):
                    for l in range(0,len(usrList)):
                        for m in range(0,len(usrList)):
                            for n in range(0,len(usrList)):
                                for o in range(0,len(usrList)):
                                    if i !=j and i != k and i!= l and i != m and i != n and i != 0 and j!= k and j!= l and j!= m and j !=n and j != o and k != l and k != m and k != n and k!= o and l !=m and l != n and l != 0 and m!= n and m != o and n != o:
                                        if str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l])+str(usrList[m])+str(usrList[n])+str(usrList[o]) not in storedList and str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l])+str(usrList[m])+str(usrList[n])+str(usrList[o])+'\n' in dicList :
                                            print(str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l])+str(usrList[m])+str(usrList[n])+str(usrList[o]))
                                            storedList.append(str(usrList[i])+str(usrList[j])+str(usrList[k])+str(usrList[l])+str(usrList[m])+str(usrList[n])+str(usrList[o]))        



f = 'ScrabbleDic.txt'
with open(f,'r') as file:
    dicList=[]
    for line in file:
        dicList.append(line)
    file.close()

if __name__ == '__main__':
    word7(usrList)
    word6(usrList)
    word5(usrList)
    word4(usrList)
    word3(usrList)
    word2(usrList)

In general, you'll often get more value out of redesigning your algorithm than you will from using multiprocessing.一般来说,你从重新设计算法中获得的价值往往比使用多处理获得的价值更多。

Here's a shorter implementation of your code.这是您的代码的较短实现。 I've hardcoded the usrList, and since I don't have access to the dictionary file you're using, I'm using the default dictionary file that comes with MacOS.我已经对 usrList 进行了硬编码,并且由于我无权访问您正在使用的字典文件,因此我使用的是 MacOS 附带的默认字典文件。 Instead of writing nested loops and checking for duplicate indices, I'm using the itertools module to generate all permutations of the usrList for a given length.我没有编写嵌套循环和检查重复索引,而是使用 itertools 模块生成给定长度的 usrList 的所有排列。 This won't meaningfully speed up the code, but it makes it easier to demonstrate possible changes:这不会显着加快代码速度,但可以更轻松地演示可能的更改:

import itertools

usrList = ['P', 'Y', 'T', 'H', 'O', 'N', 'S']
storedList = []
with open('/usr/share/dict/words', 'r') as dict_file:
    dicList = [word.strip().upper() for word in dict_file]


def possible_words(length):
    for letter_permutation in itertools.permutations(usrList, length):
        word = ''.join(letter_permutation)  # itertools returns a tuple, not a string
        if word in dicList:  # This requires a linear search through the list
            storedList.append(word)


for word_length in range(2, 8):  # Note that the upper bound is 7 letters, not 8
    possible_words(word_length)

This takes about 47.4 seconds to run on my Macbook.在我的 Macbook 上运行大约需要47.4 秒 To speed it up, let's add multiprocessing like you suggest.为了加快速度,让我们按照您的建议添加多处理。 There are a few ways to use multiprocessing, but the easiest to implement is probably creating a Pool and calling its map() function.有几种使用多处理的方法,但最容易实现的可能是创建一个池并调用它的map() function。

This syntax can look a bit weird if you aren't used to functions that take other functions as arguments.如果您不习惯使用其他函数作为 arguments 的函数,此语法可能看起来有点奇怪。 Effectively, we're creating a pool of workers, then giving that pool a function and a range of arguments to use on that function.实际上,我们正在创建一个工人池,然后为该池提供一个 function 和一系列 arguments 以在该 function 上使用。 The individual function calls are then split across the pool instead of being called sequentially:然后将各个 function 调用拆分到池中,而不是按顺序调用:

import itertools
import multiprocessing

usrList = ['P', 'Y', 'T', 'H', 'O', 'N', 'S']
storedList = []
with open('/usr/share/dict/words', 'r') as dict_file:
    dicList = [word.strip().upper() for word in dict_file]


def possible_words(length):
    for letter_permutation in itertools.permutations(usrList, length):
        word = ''.join(letter_permutation)
        if word in dicList:
            storedList.append(word)


if __name__ == '__main__':  # multiprocessing complains if this isn't isolated
    with multiprocessing.Pool(6) as p:  # Creates 6 worker processes
        p.map(possible_words, range(2, 8))  # Each process calls possible_words() with a different input

This runs in 32.3 seconds on my Macbook.这在我的 Macbook 上运行32.3秒。 We shaved off a quarter of the time, There are probably ways to squeeze a bit more performance out of this approach.我们缩短了四分之一的时间,可能有一些方法可以从这种方法中挤出更多的性能。 but it's also worth looking at the algorithm to see whether there are other ways to speed this up.但也值得研究一下算法,看看是否有其他方法可以加快速度。

Right now, you're creating a list of dictionary words.现在,您正在创建一个字典单词列表。 When you check whether a potential word is in that list, Python has to scan through the whole list until it finds a match or reaches the end.当您检查潜在单词是否在该列表中时,Python 必须扫描整个列表,直到找到匹配项或到达末尾。 My built-in dictionary has 235K words, so this means it has to do 235K string comparisons for every nonsense combination of letters it generates!我的内置字典有 235K 单词,所以这意味着它必须对它生成的每个无意义的字母组合进行 235K 字符串比较!

If you switch from using a list to a set, Python can instead look up a value in nearly-constant time by using a hash function, rather than scanning each entry one at a time.如果从使用列表切换到集合,Python 可以改为使用 hash function 在几乎恒定的时间内查找一个值,而不是在每个条目的扫描时间。 Let's try that instead of multiprocessing:让我们尝试一下,而不是多处理:

import itertools

usrList = ['P', 'Y', 'T', 'H', 'O', 'N', 'S']
storedList = []
with open('/usr/share/dict/words', 'r') as dict_file:
    dicSet = {word.strip().upper() for word in dict_file}   # By changing [] to {}, this is now a set


def possible_words(length):
    for letter_permutation in itertools.permutations(usrList, length):
        word = ''.join(letter_permutation)
        if word in dicSet:  # This now only does 1 check, not 235,000
            storedList.append(word)


for word_length in range(2, 8):
    possible_words(word_length)

This version runs in 0.005 seconds , after changing just two characters!这个版本在0.005 秒内运行,只需更改两个字符!

In summary, multiprocessing is a useful tool, but it probably shouldn't be the first thing you try.总之,多处理是一个有用的工具,但它可能不应该是您尝试的第一件事。 You'll usually get much better results by thinking through the data structures and algorithms you're using and where the bottleneck is likely to be.通过思考您正在使用的数据结构和算法以及瓶颈可能在哪里,您通常会获得更好的结果。

The classical solution to solving puzzles like this is to not check every permutation possible, but instead to convert the sample letters and the words in the dictionary to a consistent searchable permutation - by sorting their characters!解决此类难题的经典解决方案不是检查每个可能的排列,而是将样本字母和字典中的单词转换为一致的可搜索排列 - 通过对它们的字符进行排序!

Now instead of searching a dictionary for every permutation of 'PYTHONS', you just sort the letters to create the key 'HNOPSTY' and all valid words with the same key will be found in the map.现在,您无需在字典中搜索“PYTHONS”的每个排列,只需对字母进行排序以创建键“HNOPSTY”,所有具有相同键的有效单词都将在 map 中找到。

Using a defaultdict, it is easy to create a lookup map of all words in your dictionary.使用 defaultdict,很容易创建字典中所有单词的查找 map。 We use a defaultdict(list) instead of a dict because multiple words may sort to the same key.我们使用defaultdict(list)而不是 dict 因为多个单词可能排序到同一个键。

from collections import defaultdict
dictionary_mapping = defaultdict(list)

# assuming dictionary is a list of all valid words, regardless of length
for word in dictionary:
    key = ''.join(sorted(word.upper()))
    dictionary_mapping[key].append(word)

search_word = "PYTHONS"
search_key = ''.join(sorted(search_word.upper()))

# get all words that are anagrams of the search word, or the empty list if none
print(dictionary_mapping.get(search_key, []))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM