Get a random sample of a dict

Question

I'm working with a big dictionary and for some reason I also need to work on small random samples from that dictionary. How can I get this small sample (for example of length 2)?

Here is a toy-model:

dy={'a':1, 'b':2, 'c':3, 'd':4, 'e':5}

I need to perform some task on dy which involves all the entries. Let us say, to simplify, I need to sum together all the values:

s=0
for key in dy.key:
    s=s+dy[key]

Now, I also need to perform the same task on a random sample of dy; for that I need a random sample of the keys of dy. The simple solution I can imagine is

sam=list(dy.keys())[:1]

In that way I have a list of two keys of the dictionary which are somehow random. So, going back to may task, the only change I need in the code is:

s=0
for key in sam:
    s=s+dy[key]

The point is I do not fully understand how dy.keys is constructed and then I can't foresee any future issue

Answer 1

def sample_from_dict(d, sample=10):
    keys = random.sample(list(d), sample)
    values = [d[k] for k in keys]
    return dict(zip(keys, values))

Answer 2

Given your example of:

dy = {'a':1, 'b':2, 'c':3, 'd':4, 'e':5}

Then the sum of all the values is more simply put as:

s = sum(dy.values())

Then if it's not memory prohibitive, you can sample using:

import random

values = list(dy.values())
s = sum(random.sample(values, 2))

Or, since random.sample can take a set -like object, then:

from operator import itemgetter
import random

s = sum(itemgetter(*random.sample(dy.keys(), 2))(dy))

Or just use:

s = sum(dy[k] for k in random.sample(dy.keys(), 2))

An alternative is to use a heapq , eg:

import heapq
import random

s = sum(heapq.nlargest(2, dy.values(), key=lambda L: random.random()))

Answer 3

用来自 numphy 的一些随机样本替换range(10)

{v:rows[v] for v in [list(rows.keys())[k] for k in range(10)]}

Answer 4

This should be quicker than creating a new dict and checking if the keys are part of the sample:

import random    
sample_n = 1000
output_dict = dict(random.sample(input_dict.items(), sample_n))

Answer 5

import random
origin_dict =  {'a':1, 'b':2, 'c':3, 'd':4, 'e':5}
sample_rate = 0.3
random_keys = random.sample(list(origin_dict.keys()), int(sample_rate * len(origin_dict)))
random_values = [origin_dict[k] for k in random_keys]

sample_dict = dict(zip(random_keys, random_values))

output:

{'d': 4, 'c': 3}

Answer 6

Similar to @J-Mourad's nice answer, but using a dictionary comprehension:

def sample_from_dict(d, n=10):
    keys = random.sample(list(d), n)
    return {k: d[k] for k in keys}

Get a random sample of a dict

Question

6 answers

solution1
5 2020-01-20 20:32:47

solution2
2 ACCPTED 2016-10-12 15:33:11

solution3
1 2018-05-30 08:55:48

solution4
1 2021-02-02 21:43:15

solution5
0 2019-06-27 13:12:43

solution6
-1 2022-11-15 19:02:39

Get a random sample of a dict

Question

6 answers

solution1 5 2020-01-20 20:32:47

solution2 2 ACCPTED 2016-10-12 15:33:11

solution3 1 2018-05-30 08:55:48

solution4 1 2021-02-02 21:43:15

solution5 0 2019-06-27 13:12:43

solution6 -1 2022-11-15 19:02:39

solution1
5 2020-01-20 20:32:47

solution2
2 ACCPTED 2016-10-12 15:33:11

solution3
1 2018-05-30 08:55:48

solution4
1 2021-02-02 21:43:15

solution5
0 2019-06-27 13:12:43

solution6
-1 2022-11-15 19:02:39