简体   繁体   中英

Is there a way to force SymSpell Python to return more than one correction recommendation?

I'm using the symspellpy module in Python for query correction. It is really useful and fast, but I'm having a issue with it.

Is there a way to force Symspell to return more than one recommendation for correction. I need it to analyse a better correction based on my application.

I'm calling Symspell like this:

suggestions = sym_spell.lookup(query, VERBOSITY_ALL, max_edit_distance=3)

Example of what I'm trying to do:

query = "resende" . The return that I want ["resende", "rezende"] . What the method returns ["resende"] . Note that both "resende" and "rezende" are in my dictionary.

Merely a typo. Change the underscore in

Verbosity_ALL ... to

Verbosity.ALL

The three options are CLOSEST , TOP and ALL

Couple of other things in SymSpell...

Four algorithm choices

Describedhere

Supported edit distance algorithm choices.

LEVENSHTEIN      = 0        Levenshtein algorithm
DAMERAU_OSA      = 1        Damerau optimal string alignment algorithm  (default)
LEVENSHTEIN_FAST = 2        Fast Levenshtein algorithm
DAMERAU_OSA_FAST = 3        Fast Damerau optimal string alignment algorithm

DAMERAU_OSA    # high count/frequency wins when using .ALL but distances tied?
LEVENSHTEIN    # lowest edit distance wins (fewest changes needed)

To change from the default, overwrite it with one of them:

from symspellpy.editdistance import DistanceAlgorithm
sym_spell._distance_algorithm = DistanceAlgorithm.LEVENSHTEIN

Output object details

word = 'something'
matches = sym_spell.lookup(word, Verbosity.ALL, max_edit_distance=2)
for match in matches:   # match is ... term, distance, count
    print(f'{word} -> {match.term}   {match.distance}   {match.count}')

Using collections Counter() with SymSpell instead of loading words from file

SymSpell can only read the dictionary of ok words from a file currently (Apr 2022) however this can be added inside symspellpy.py to make it able to read from a collections Counter() output dict or other dictionary of words: counts , a mere quick hack that works for my purposes...

def load_Counter_dictionary(self, counts_each):
    for key, count in counts_each.items():
        self.create_dictionary_entry(key, count)

Can then drop the use of load_dictionary(), for something like this instead...

sym_spell.load_Counter_dictionary( Counter(words_list) )

The reason I resorted to that is a million+ record csv file was already loaded into a pandas dataframe containing a column of codes (think words) with some of them in large numbers (likely correct) along with outliers to be corrected and a column already made containing their counts each. So rather than saving the counts dict to file (expensive) and the reload by SymSpell, this is direct and efficient.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM