简体   繁体   中英

Using Binary Search for Spelling Check

I am trying to use binary search to check the spelling of words in a file, and print out the words that are not in the dictionary. But as of now, most of the correctly spelled words are being printed as misspelled (words that cannot be find in the dictionary). Dictionary file is also a text file that looks like:

abactinally
abaction
abactor
abaculi
abaculus
abacus
abacuses
Abad
abada
Abadan
Abaddon
abaddon
abadejo
abadengo
abadia

Code:

def binSearch(x, nums):
    low = 0
    high = len(nums)-1
    while low <= high:          
        mid = (low + high)//2   
        item = nums[mid]
        if x == item :
            print(nums[mid])
            return mid
        elif x < item:         
            high = mid - 1      
        else:                  
            low = mid + 1       
    return -1                  



def main():

    print("This program performs a spell-check in a file")
    print("and prints a report of the possibly misspelled words.\n")

    # get the sequence of words from the file
    fname = input("File to analyze: ")
    text = open(fname,'r').read()
    for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~':
        text = text.replace(ch, ' ')
    words = text.split()

    #import dictionary from file
    fname2 =input("File of dictionary: ")
    dic = open(fname2,'r').read()
    dic = dic.split()

    #perform binary search for misspelled words
    misw = []
    for w in words:
        m = binSearch(w,dic)
        if m == -1:
            misw.append(w)

Your binary search works perfectly! You don't seem to be removing all special characters, though.

Testing your code (with a sentence of my own):

def main():

   print("This program performs a spell-check in a file")
   print("and prints a report of the possibly misspelled words.\n")

   text = 'An old mann gathreed his abacus, and ran a mile.  His abacus\n ran two miles!'
   for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~':
       text = text.replace(ch, ' ')
   words = text.lower().split(' ')

   dic = ['a','abacus','an','and','arranged', 'gathered', 'his', 'man','mile','miles','old','ran','two']

   #perform binary search for misspelled words
   misw = []
   for w in words:
       m = binSearch(w,dic)
       if m == -1:
           misw.append(w)
   print misw

prints as output ['mann', 'gathreed', '', '', 'abacus\\n', '']

Those extra empty strings '' are the extra spaces for punctuation that you replaced with spaces. The \\n (a line break) is a little more problematic, as it is something you definitely see in external text files but is not something intuitive to account for. What you should do instead of for ch in '!"#$%&()*+,-./:;<=>?@[\\\\]^_``{|}~': is just check to see if every character .isalpha() Try this:

def main():

   ...

   text = 'An old mann gathreed his abacus, and ran a mile. His abacus\n ran two miles!'
   for ch in text:
       if not ch.isalpha() and not ch == ' ': 
           #we want to keep spaces or else we'd only have one word in our entire text
           text = text.replace(ch, '') #replace with empty string (basically, remove)
   words = text.lower().split(' ')

   #import dictionary
   dic = ['a','abacus','an','and','arranged', 'gathered', 'his', 'man','mile','miles','old','ran','two']

   #perform binary search for misspelled words
   misw = []
   for w in words:
       m = binSearch(w,dic)
       if m == -1:
           misw.append(w)
   print misw

Output:

This program performs a spell-check in a file
and prints a report of the possibly misspelled words.

['mann', 'gathreed']

Hope this was helpful! Feel free to comment if you need clarification or something doesn't work.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM