I am trying to use binary search to check the spelling of words in a file, and print out the words that are not in the dictionary. But as of now, most of the correctly spelled words are being printed as misspelled (words that cannot be find in the dictionary). Dictionary file is also a text file that looks like:
abactinally
abaction
abactor
abaculi
abaculus
abacus
abacuses
Abad
abada
Abadan
Abaddon
abaddon
abadejo
abadengo
abadia
Code:
def binSearch(x, nums):
low = 0
high = len(nums)-1
while low <= high:
mid = (low + high)//2
item = nums[mid]
if x == item :
print(nums[mid])
return mid
elif x < item:
high = mid - 1
else:
low = mid + 1
return -1
def main():
print("This program performs a spell-check in a file")
print("and prints a report of the possibly misspelled words.\n")
# get the sequence of words from the file
fname = input("File to analyze: ")
text = open(fname,'r').read()
for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~':
text = text.replace(ch, ' ')
words = text.split()
#import dictionary from file
fname2 =input("File of dictionary: ")
dic = open(fname2,'r').read()
dic = dic.split()
#perform binary search for misspelled words
misw = []
for w in words:
m = binSearch(w,dic)
if m == -1:
misw.append(w)
Your binary search works perfectly! You don't seem to be removing all special characters, though.
Testing your code (with a sentence of my own):
def main():
print("This program performs a spell-check in a file")
print("and prints a report of the possibly misspelled words.\n")
text = 'An old mann gathreed his abacus, and ran a mile. His abacus\n ran two miles!'
for ch in '!"#$%&()*+,-./:;<=>?@[\\]^_`{|}~':
text = text.replace(ch, ' ')
words = text.lower().split(' ')
dic = ['a','abacus','an','and','arranged', 'gathered', 'his', 'man','mile','miles','old','ran','two']
#perform binary search for misspelled words
misw = []
for w in words:
m = binSearch(w,dic)
if m == -1:
misw.append(w)
print misw
prints as output ['mann', 'gathreed', '', '', 'abacus\\n', '']
Those extra empty strings ''
are the extra spaces for punctuation that you replaced with spaces. The \\n
(a line break) is a little more problematic, as it is something you definitely see in external text files but is not something intuitive to account for. What you should do instead of for ch in '!"#$%&()*+,-./:;<=>?@[\\\\]^_``{|}~':
is just check to see if every character .isalpha()
Try this:
def main():
...
text = 'An old mann gathreed his abacus, and ran a mile. His abacus\n ran two miles!'
for ch in text:
if not ch.isalpha() and not ch == ' ':
#we want to keep spaces or else we'd only have one word in our entire text
text = text.replace(ch, '') #replace with empty string (basically, remove)
words = text.lower().split(' ')
#import dictionary
dic = ['a','abacus','an','and','arranged', 'gathered', 'his', 'man','mile','miles','old','ran','two']
#perform binary search for misspelled words
misw = []
for w in words:
m = binSearch(w,dic)
if m == -1:
misw.append(w)
print misw
Output:
This program performs a spell-check in a file
and prints a report of the possibly misspelled words.
['mann', 'gathreed']
Hope this was helpful! Feel free to comment if you need clarification or something doesn't work.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.