简体   繁体   English

SQL中集的Python字典

[英]Python dictionary of sets in SQL

I have a dictionary in Python where the keys are integers and the values sets of integers. 我在Python中有一个字典,其中键是整数和整数的值集。 Considering the potential size (millions of key-value pairs, where a set can contain from 1 to several hundreds of integers), I would like to store it in a SQL (?) database, rather than serialize it with pickle to store it and load it back in whenever I need it. 考虑到潜在的大小(数百万个键值对,其中一个集合可以包含1到几百个整数),我想将它存储在SQL(?)数据库中,而不是将其序列化为pickle来存储它和在我需要的时候把它装回去。

From reading around I see two potential ways to do this, both with its downsides: 从阅读中我看到两种可能的方法来做到这一点,两者都有其缺点:

  • Serialize the sets and store them as BLOBs: So I would get an SQL with two columns, the first column are the keys of the dictionary as INTEGER PRIMARY KEY, the second column are the BLOBS, containing a set of integers. 序列化集合并将它们存储为BLOB:所以我会得到一个带有两列的SQL,第一列是字典的键,如INTEGER PRIMARY KEY,第二列是BLOBS,包含一组整数。 Downside: Not able to alter sets anymore without loading the complete BLOB in, and after adding a value to it, serialize it back and insert it back to the database as a BLOB. 缺点:如果不加载完整的BLOB,则无法更改集合,并在向其添加值后,将其序列化并将其作为BLOB插回数据库。

  • Add a unique key for each element of each set: I would get two columns, one with the keys (which are now key_dictionary + index element of set/list), one with one integer value in each row. 为每个集合的每个元素添加一个唯一的键:我会得到两列,一列是键(现在是key_dictionary + set / list的索引元素),每列有一个整数值。 I'd now be able to add values to a "set" without having to load the whole set into python. 我现在能够将值添加到“set”而无需将整个集合加载到python中。 I would have to put more work in keeping track of all the keys. 我需要做更多的工作来跟踪所有的密钥。

In addition, once the database is complete, I will always need sets as a whole, so idea 1 seems to be faster? 另外,一旦数据库完成,我将总是需要集合,所以想法1似乎更快? If I query for all in primary keys BETWEEN certain values, or LIKE certain values, to obtain my whole set in system 2, will the SQL database (sqlite) still work as a hashtable? 如果我在主键中查询某些值(或者某些值,或者某些值),以获得我在系统2中的整个集合,那么SQL数据库(sqlite)是否仍然可以作为哈希表工作? Or will it linearly search for all values that fit my BETWEEN or LIKE search? 或者它会线性搜索适合我的BETWEEN或LIKE搜索的所有值吗?

Overall, what's the best way to tackle this problem? 总的来说,解决这个问题的最佳方法是什么? Obviously, if there's a completely different 3rd way that solves my problems naturally, feel free to suggest it! 显然,如果有一种完全不同的第三种方式可以自然地解决我的问题,请随时提出建议! (haven't found any other solution by searching around) (通过搜索找不到任何其他解决方案)

I'm kind of new to Python and especially to databases, so let me know if my question isn't clear. 我是Python的新手,尤其是数据库,所以如果我的问题不清楚,请告诉我。 :) :)

You second answer is nearly what I would recommend. 你的第二个答案几乎是我推荐的。 What I would do is have three columns: 我要做的是有三列:

  • Set ID 设置名称
  • Key
  • Value

I would then create a composite primary key on the Set ID and Key which guarantees that the combination is unique: 然后,我将在Set ID和Key上创建一个复合主键,以确保组合是唯一的:

CREATE TABLE something (
  set, 
  key, 
  value, 
  PRIMARY KEY (set, key)
);

You can now add a value straight into a particular set (Or update a key in a set) and select all keys in a set. 您现在可以将值直接添加到特定集合中(或更新集合中的键)并选择集合中的所有键。

This being said, your first strategy would be more optimal for read-heavy workloads as the size of the indexes would be smaller. 话虽这么说,您的第一个策略对于读取繁重的工作负载来说会更优化,因为索引的大小会更小。

will the SQL database (sqlite) still work as a hashtable? SQL数据库(sqlite)仍然可以作为哈希表工作吗?

SQL databases tend not to use hashtables. SQL数据库倾向于使用哈希表。 Nor do they usually do a sequential lookup. 它们通常也不进行顺序查找。 What they do is usually create an index (Which tends to be some kind of tree, eg a B-tree) which allows for range lookups (eg where you don't know exactly what keys you're looking for). 他们所做的通常是创建一个索引(它往往是某种树,例如B树),它允许范围查找(例如,你不确切知道你正在寻找什么键)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM