I'm trying to write an anagram service. The first stage of the program is to go through a dictionary of words and create a Python dictionary with keys for the word lengths and values of the words of those lengths, ie:
def processedDictionary():
d = defaultdict(list)
f = open(dictionaryFile, "r")
f.close()
for line in lines:
length = len(line)
d[length].append(line)
return d
This means that the anagram word only has to be compared to words of the same length, with processedDictionary()[length]
which speeds up the script. However, I was trying to optimise the script even more, because it is silly that the dictionary has to be 'processed' every time somebody anagrams a word, so I looked at pickle for loading the already sorted dictionary every time:
def processedDictionary():
file = open("dic.obj",'rb')
object_file = pickle.load(file)
file.close()
return object_file
dic.obj
is a 2MB dump of the processed dictionary. However, even with cPickle the pickled dictionary loads about twice as slow as the original script! Can anybody suggest what I am missing here and what the correct route to optimise the dictionary loading is?
When you dump the data, make sure you specify the protocol to use:
with open('dict.obj', 'wb') as fh:
pickle.dump(obj, fh, pickle.HIGHEST_PROTOCOL)
And when loading, you should see a speed increese if you switch to Python 3 (if possible).
with open('dict.obj', 'rb') as fh:
return pickle.load(fh)
Also storing the pickled file on a separate medium would be reccommended. Because running everything from the same device will slow down the reading process.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.