简体   繁体   English

返回一个空列表而不是二元组

[英]Returns an empty list instead of bigrams

The code mentioned below returns the expected output.下面提到的代码返回预期的 output。

[('the', 23135851162), ('of', 13151942776), ('and', 12997637966), ('to', 12136980858), ('a', 9081174698)] [('the', 23135851162), ('of', 13151942776), ('and', 12997637966), ('to', 12136980858), ('a', 9081174698)]

from itertools import islice
import pkg_resources
from symspellpy import SymSpell

sym_spell = SymSpell()
dictionary_path = pkg_resources.resource_filename(
    "symspellpy", "frequency_dictionary_en_82_765.txt")
sym_spell.load_dictionary(dictionary_path, 0, 1)

# Print out first 5 elements to demonstrate that dictionary is
# successfully loaded
print(list(islice(sym_spell.words.items(), 5)))

But the next block of code returns an empty list .但是下一个代码块返回一个空列表

from itertools import islice
import pkg_resources
from symspellpy import SymSpell

sym_spell = SymSpell()
dictionary_path = pkg_resources.resource_filename(
    "symspellpy", "frequency_dictionary_en_82_765.txt")
sym_spell.load_bigram_dictionary(dictionary_path, 0, 2)

# Print out first 5 elements to demonstrate that dictionary is
# successfully loaded
print(list(islice(sym_spell.bigrams.items(), 5)))

The expected output is:预期的 output 为:

[('abcs of', 10956800), ('aaron and', 10721728), ('abbott and', 7861376), ('abbreviations and', 13518272), ('aberdeen and', 7347776)] [('abcs of', 10956800), ('aaron and', 10721728), ('abbott and', 7861376), ('abbreviations and', 13518272), ('aberdeen and', 7347776)]

as per this page:根据此页面:

https://symspellpy.readthedocs.io/en/latest/examples/dictionary.html https://symspellpy.readthedocs.io/en/latest/examples/dictionary.html

I will like to know the mistake that I made with the second section of code.我想知道我在第二段代码中犯的错误。

The second example given on the linked page and also in your question references the wrong data file.链接页面上以及您的问题中给出的第二个示例引用了错误的数据文件。 You have to refer the included bigram data file.您必须参考包含的 bigram 数据文件。

The doc explaining the examples shows the expected data formats for each example, and the formats are different.解释示例的文档显示了每个示例的预期数据格式,并且格式不同。 And yet, the two examples refer to the same datafile.然而,这两个示例引用了同一个数据文件。 This has to be wrong in one place or the other, and it is wrong in that the second example should refer to the bigram data file.这肯定在某个地方是错误的,并且第二个示例应该引用 bigram 数据文件是错误的。

Here's the complete code that works correctly:这是正确工作的完整代码:

from itertools import islice
import pkg_resources
from symspellpy import SymSpell

sym_spell = SymSpell()
dictionary_path = pkg_resources.resource_filename(
    "symspellpy", "frequency_bigramdictionary_en_243_342.txt") # << - fixed to refer to the bigram data file
sym_spell.load_bigram_dictionary(dictionary_path, 0, 2)

# Print out first 5 elements to demonstrate that dictionary is
# successfully loaded
print(list(islice(sym_spell.bigrams.items(), 5)))

Result:结果:

[('abcs of', 10956800), ('aaron and', 10721728), ('abbott and', 7861376), ('abbreviations and', 13518272), ('aberdeen and', 7347776)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM