简体   繁体   中英

What is the internal load factor of a sets in Python

I am trying to find out what the internal load factor is for the Python sets. For dictionary which uses a hash table with a load factor of 0.66 (2/3) is. The number of buckets start at 8 and when the 6th key is inserted the number of buckets increases to 16 The table below shows the shift in buckets.

bucket shift
8 5
16 10
32 21
64 42
128 85

This can be seen with de following Python code where the size of a dictionary and sets is shows with the getsizeof method:

import sys
d = {}
s = set()

for x in range(25):
    d[x] = 1
    s.add(x)
    print(len(d), sys.getsizeof(d), sys.getsizeof(s))

# of elements memory used for dict memory used for sets
1 232 216
2 232 216
3 232 216
4 232 216
5 232 728
6 360 728
7 360 728
8 360 728
9 360 728
10 360 728
11 640 728
12 640 728
13 640 728
14 640 728
15 640 728
16 640 728
17 640 728
18 640 728
19 640 2264
20 640 2264
21 640 2264
22 1176 2264
23 1176 2264
24 1176 2264
25 1176 2264

The above table shows that the shift in the buckets correct is for the dictionary, but not for the sets. The memory in sets is different.

I am trying to find out what the load factor is for a set. Is that also 2/3? Or am I doing something wrong with the code?

Currently, it's about 3/5. See the source :

if ((size_t)so->fill*5 < mask*3)
    return 0;
return set_table_resize(so, so->used>50000 ? so->used*2 : so->used*4);

fill is the number of occupied table cells (including "deleted entry" markers), and mask is 1 less than the total table capacity.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM