[英]Python RegEx- hangman algorithm
I am trying to write a hangman algorithm.我正在尝试编写一个刽子手算法。 My idea for it goes like this:我的想法是这样的:
Example:例子:
#Each key corresponds to length of the word.
frequencyDict = {2: ['a', 'o', 'e', 'i', 'm', 'h', 'n', 'u', 's', 't', 'y', 'b', 'd', 'l', 'p', 'x', 'f', 'r', 'w', 'g', 'k', 'j'],
3: ['a', 'e', 'o', 'i', 't', 's', 'u', 'p', 'r', 'n', 'd', 'b', 'm', 'g', 'y', 'l', 'h', 'w', 'f', 'c', 'k', 'x', 'v', 'j', 'z', 'q'],
4: ['e', 'a', 's', 'o', 'i', 'l', 'r', 't', 'n', 'u', 'd', 'p', 'm', 'h', 'b', 'c', 'g', 'k', 'y', 'f', 'w', 'v', 'j', 'z', 'x', 'q'],
5: ['s', 'e', 'a', 'o', 'r', 'i', 'l', 't', 'n', 'd', 'u', 'c', 'p', 'y', 'm', 'h', 'g', 'b', 'k', 'f', 'w', 'v', 'z', 'x', 'j', 'q'],
6: ['e', 's', 'a', 'r', 'i', 'o', 'l', 'n', 't', 'd', 'u', 'c', 'p', 'm', 'g', 'h', 'b', 'y', 'f', 'k', 'w', 'v', 'z', 'x', 'j', 'q'],
7: ['e', 's', 'a', 'i', 'r', 'n', 'o', 't', 'l', 'd', 'u', 'c', 'g', 'p', 'm', 'h', 'b', 'y', 'f', 'k', 'w', 'v', 'z', 'x', 'j', 'q'],
8: ['e', 's', 'i', 'a', 'r', 'n', 'o', 't', 'l', 'd', 'c', 'u', 'g', 'p', 'm', 'h', 'b', 'y', 'f', 'k', 'w', 'v', 'z', 'x', 'q', 'j']}
I also have a generator of words in a dictionary:我还有一个字典中的单词生成器:
dictionary = word_reader('C:\\Python27\\dictionary.txt', len(letters))
Which is based on this function这是基于这个 function
#Strips dictionary of words that are too big or too small from the list
def word_reader(filename, L):
L2 = L+2
return (word.strip() for word in open(filename) \
if len(word) < L2 and len(word) > 2)
p = re.compile('^e\D\D\D\De\D$', re.IGNORECASE)
will do it, but it might find words that contain 'e's in other places besides the first letter and second to last letter. p = re.compile('^e\D\D\D\De\D$', re.IGNORECASE)
会这样做,但它可能会在除第一个字母和倒数第二个字母之外的其他位置找到包含 'e' 的单词信。
So my first question is:所以我的第一个问题是:
For example, if the word is monkey, the computer would just be given ----e- The first step would be for it to strip from its dictionary all words that are not 6 letters, and all words that do not conform perfectly to the '----e-' template and put that in a newList.例如,如果单词是猴子,则计算机将只给出 -e- 第一步是让它从字典中删除所有不是 6 个字母的单词,以及所有不完全符合的单词'----e-' 模板并将其放入新列表中。 How do I go about doing this?我该怎么做呢?
It then computes a NEW frequencyDict based on the relative frequency of words that are in its newList.然后它根据 newList 中单词的相对频率计算一个 NEW frequencyDict。
My current method of doing this looks like this:我目前的做法是这样的:
cnt = Counter()
for words in dictionary:
for letters in words:
cnt[letters]+=1
Is this the most efficient way?这是最有效的方法吗?
It would then use the newfrequencyDict to guess the most common letter, assuming it has not already been guessed.然后它会使用 newfrequencyDict 来猜测最常见的字母,假设它还没有被猜到。 It continues to do this until (hopefully) the word is guessed.它会继续这样做,直到(希望)这个词被猜到为止。
Is this an efficient algorithm?这是一个有效的算法吗? Are there better implementations?有更好的实现吗?
That's quite a lot of questions.这是相当多的问题。 I'll try to answer a few.我会试着回答几个。
^e[^e][^e][^e][^e]e[^e]$
'.您的正则表达式应该看起来更像这样:' ^e[^e][^e][^e][^e]e[^e]$
'。 Those [^e]
bits say "match any character that is not 'e'. Note that unlike your regex, this will mach non-letter characters, but that shouldn't be a problem if you make sure your dictionary has only letters. Note that once you have uncovered more than one letter, you would put all the letters into each of those "don't match" sections. For example, say that the 'a' is guessed, so it's "ea---e-", now you will match with the regex ' ^ea[^ae][^ae][^ae]e[^ae]$
'.那些[^e]
位表示“匹配任何不是'e'的字符。请注意,与您的正则表达式不同,这将处理非字母字符,但如果您确保您的字典只有字母,那应该不是问题。请注意,一旦您发现了多个字母,您会将所有字母放入每个“不匹配”部分。例如,假设“a”是猜测的,所以它是“ea---e- ",现在你将匹配正则表达式' ^ea[^ae][^ae][^ae]e[^ae]$
'。{'a', 'e'}
), b) flatten the set into a "match-all-but-this" regex fragment ( [^ae]
) -- note that the order is not important which is why I used a set, c) substitute each hyphen with one of those ( ea[^ae][^ae][^ae]e[^ae]
), and d) finally just put a ' ^
' at the front and ' $
' at the end.它只需要 a) 将字符串中的所有非连字符字母作为一个集合(在本例中为{'a', 'e'}
),b) 将集合展平为“匹配所有-但是-this” 正则表达式片段( [^ae]
)-请注意,顺序并不重要,这就是我使用集合的原因,c)用其中一个连字符替换每个连字符( ea[^ae][^ae][^ae]e[^ae]
) 和 d) 最后只在前面放一个' ^
',最后放一个' $
'。There's nothing particularly magical about regexes, and matching them against your whole dictionary is still going to take O(n) time.正则表达式没有什么特别神奇的地方,将它们与整个字典进行匹配仍然需要 O(n) 时间。 I'd recommend writing your own function that determines if a word is a match for a template, and running your dictionary-so-far through that.我建议您编写自己的 function 来确定一个单词是否与模板匹配,并通过它运行您的字典。
Here's an example function:这是一个示例 function:
def matches_template(word, template):
found_chars = set(x for x in template if x != '-')
for char, template_char in zip(word, template):
if template_char == '-':
if char in found_chars: return False
else:
if template_char != char: return False
return True
As far as determining the next character to guess, you probably don't want to select the most frequent character.至于确定下一个要猜测的字符,您可能不想 select 出现频率最高的字符。 Instead, you want to select the character that comes closest to being in 50% of words, meaning you eliminate the most possibilities either way.相反,您想要 select 最接近出现在 50% 单词中的字符,这意味着无论哪种方式,您都消除了最多的可能性。 Even that isn't optimal - it could be that certain characters are more likely to occur twice in the word, and therefore eliminate a larger proportion of candidates - but it's closer.即使这样也不是最佳的——可能是某些字符更有可能在单词中出现两次,因此消除了更大比例的候选者——但它更接近。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.