简体   繁体   中英

Dictionary, set or frozenset?

I have a large collection of data, about 10 million entries and part of my program required very many membership checks...

if a in data:
    return True
return False

right now I have data as dictionary entries with all their values equal to '1'

I also have a program that uses an algorithm to figure out the same information, but for now it is slower then the dictionary method however I expect the size of data to continue growing...

For my current dictionary solution, would type(data) as a frozenset, or set (or something else?) be faster?

And for the future to find out when I need to switch to my program, does anyone know how the speed of checking membership correlated with increasing the size of a type that's hashable? Is a dictionary with 1 billion entries still fast?

On Principal

If you expect the data to keep growing you can't use a frozenset.

A set would be smaller than a dictionary storage wise for testing if an element exist in it. It would be similar in speed to a dictionary lookup since the keys and items of a set are both hashed for storage and always unique. If you don't need data associated with the username, use a set.

Practically speaking...

When you are dealing with that many entries move the data to a database . You will eventually run out of memory trying to store and read all of that into memory. With a database, you can issue a specific query to check membership. Seriously. Put that data in a database.

There are several bytes overhead per entry in a hash-able (whether dictionary or set doesn't make much of a difference), so for billions of entries you will run into swapping unless you have 32+Gb of memory for the application. I would start looking for a fast DB

For frozenset you also need to have all data in memory in some acceptable form at creation time, which probably doubles the amount of mem needed

For this amount of data RyPeck is right - a DB will do the job much better.

One more point: Something seems odd to me in what you've written: If you use a dictionary to store the objects of the memberships, what the value of said key-value pair in the dictionary is '1'? Shouldn't the key-value pair of the dictionary be: "id of a"-"a" where 'a' is the object.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM