My goal is to identify the odd element in the list below.
list_1=['taska1', 'taska2', 'taska3', 'taskb2', 'taska7']
The odd item is tasksb2
as the other four items are under taska
.
They all have equal length, hence discriminating using the len function will not work. Any ideas? thanks.
If you simply want to find the item that does not start with 'taska', then you could use the following list comprehension
:
>>> list_1=['taska1', 'taska2', 'taska3', 'taskb2', 'taska7']
>>> print [l for l in list_1 if not l.startswith('taska')]
['taskb2']
Another option is to use filter
+ lambda
:
>>> filter(lambda l: not l.startswith('taska'), list_1)
['taskb2']
Seems to be an easy problem solved by alphabetical sort.
print sorted(list_1)[-1]
Don't wanna sort? Try an O(n) time-complexity solution with O(1) space complexity:
print max(list_1)
If you know what the basic structure of the items will be, then it's easy.
If you don't know the structure of your items a priori, one approach is to score the items according to their similarity against each other. Using info from this question for the standard library module difflib ,
import difflib
import itertools
list_1=['taska1', 'taska2', 'taska3', 'taskb2', 'taska7']
# Initialize a dict, keyed on the items, with 0.0 score to start
score = dict.fromkeys(list_1, 0.0)
# Arrange the items in pairs with each other
for w1, w2 in itertools.combinations(list_1, 2):
# Performs the matching function - see difflib docs
seq=difflib.SequenceMatcher(a=w1, b=w2)
# increment the "match" score for each
score[w1]+=seq.ratio()
score[w2]+=seq.ratio()
# Print the results
>>> score
{'taska1': 3.166666666666667,
'taska2': 3.3333333333333335,
'taska3': 3.166666666666667,
'taska7': 3.1666666666666665,
'taskb2': 2.833333333333333}
It turns out that taskb2 has the lowest score!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.