I have a dataset organized into a dictionary of lists, like:
{ UUID: [3, 3, 5, 3, 0, 0, 3, 3, 2, 3, 2, 1, 1, 0, 2, 0, 5, 0, 0, 0, 0, 3, 4, 1, 2],
UUID: [1, 2, 3, 1, 0, 0, 2] }
I want to detect cases of consecutive identical values (esp. 0's), in particular detecting instances of n consecutive identical values.
For example, if n were 3 and the value was 0, I would append the UUID of the first key:value pair to a list of qualifying UUIDs, but not the second.
What's the most efficient way to detect consecutive identical values in this way?
Use itertools.groupby
to detect runs of consecutive numbers:
uuids = { 'a': [3, 3, 5, 3, 0, 0, 3, 3, 2, 3, 2, 1, 1, 0, 2, 0, 5, 0, 0, 0, 0, 3, 4, 1, 2],
'b': [1, 2, 3, 1, 0, 0, 2]}
from itertools import groupby
def detect_runs_in_dict(d, n=3):
return [uuid for uuid, val in d.items() #in python 2, use .iteritems
if any(len(list(g)) >= n for k,g in groupby(val))]
demo
detect_runs_in_dict(uuids)
Out[28]: ['a']
detect_runs_in_dict(uuids,n=2)
Out[29]: ['a', 'b']
This doesn't discriminate on which value can be in "runs" - if you want to specify it, that's straightforward to add:
def detect_runs_in_dict(d, n=3, searchval=0):
return [uuid for uuid, val in d.items()
if any(k == searchval and len(list(g)) >= n for k,g in groupby(val))]
You can use itertools.groupby
to get the maximum-consecutive occurrence of a given value this way:
max(
filter(
lambda gr:gr[0]==0,
groupby(_list)
), key=lambda gr:len(list(gr[1]))
)
You can reapply len(list())
to the second argument of the result, or you can simply adjust the filter to eliminate results with shorter-than-desired consecutive occurrences.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.