How can I efficiently match words that are the same except for the last letter?
data = ['ades', 'adey', 'adhere', 'adherent', 'admin', 'admit', 'adverb', 'advert', 'adipocere', 'adipocerous', 'adjoining', 'adjoint', 'adjudicate', 'adjudication', 'adjunct']
The actual data is longer and my implementation below takes too long to run:
temp_data = data
count = 0
matches = {}
while count < len(data):
for word in data:
if word[:-1] == data[count][:-1] and data.index(word) != count:
matches[data[count]] = word
temp_data.remove(data[count])
temp_data.remove(word)
count += 1
print(matches)
this correctly prints:
{'ades': 'adey', 'advert': 'adverb', 'admin': 'admit'}
I'm new to python so any suggestions would be appreciated :)
You're comparing every word against every word and using a check that compares every word every time to make sure you're not comparing a word against itself for O(n³) time. You can get it to O(n²) time by keeping track of the index in the inner loop:
for j, word in enumerate(data):
if word[:-1] == data[count][:-1] and j != count:
matches[data[count]] = word
temp_data.remove(data[count])
temp_data.remove(word)
and then get it to O(n) by just grouping the words by their initial letters:
groups = defaultdict(list)
for word in data:
groups[word[:-1]].append(word)
print(list(groups.values()))
which can also be done using groupby
if your list is sorted:
import itertools
def init(word):
return word[:-1]
print([list(words) for key, words in itertools.groupby(data, init)])
Assuming list
is already sorted (else you need to sort it first), and there would be only two such elements in the list following the criterion. You may achieve the result via using dictionary comprehension with zip
as:
>>> data = ['ades', 'adey', 'adhere', 'adherent', 'admin', 'admit', 'adverb', 'advert', 'adipocere', 'adipocerous', 'adjoining', 'adjoint', 'adjudicate', 'adjudication', 'adjunct']
# data.sort() --> if data is not already sorted
>>> {i: j for i, j in zip(data, data[1:]) if i[:-1]==j[:-1]}
{'admin': 'admit', 'adverb': 'advert', 'ades': 'adey'}
PS: I do not think regex
is the right tool for achieving the desired result.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.