简体   繁体   中英

Downsample sublists based on length of smallest sublist

I have a list of lists like the following, and the number and length of sublists can be variable:

test = [[1, 5, 4, 3, 5, 2], [4, 2], [5, 2, 4, 3, 5], [5, 3, 1]]

I want to downsample all sublists to the length of the shortest sublist - this case 2. That means I want to randomly select 2 elements from all sublists as an output.

For a much larger list of around 100 sublists, each greater than 100000 items, what would the most efficient way be?

Using a generator expression and list comprehension with random.sample() like:

Code:

min_len = min(len(x) for x in data)
[random.sample(x, min_len) for x in data]

Test Code:

import random

data = [[1, 5, 4, 3, 5, 2], [4, 2], [5, 2, 4, 3, 5], [5, 3, 1]]
min_len = min(len(x) for x in data)
print([random.sample(x, min_len) for x in data])

Results:

[[5, 4], [4, 2], [4, 5], [5, 3]]

Using only the standard library:

import random

test = [[1, 5, 4, 3, 5, 2], [4, 2], [5, 2, 4, 3, 5], [5, 3, 1]]

min_size = float("inf")

for sublist in test:
    length = len(sublist)
    if length < min_size:
        min_size = length

new_list = [random.sample(sublist, min_size) for sublist in test]

# [[5, 4], [2, 4], [5, 3], [1, 5]]

Another Way to do it:

import random
test = [[1, 5, 4, 3, 5, 2], [4, 2], [5, 2, 4, 3, 5], [5, 3, 1]]
minlen = min(list(map(lambda x: len(x),test)))
print([random.sample(i,minlen) for i in test])

Output:

[[3, 5], [4, 2], [5, 3], [1, 3]]

Short and sweet one-liner using list comprehension:

from random import sample

[sample(l, len(min(test, key=len))) for l in test]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM