[英]High performance many-to-many relationships in python
Given cluster
and node
objects: 给定的
cluster
和node
对象:
class Cluster():
def __init__(self):
pass
class Node():
def __init__(self):
pass
I am wondering what is the best data structure or design that meets the following requirements: 我想知道满足以下要求的最佳数据结构或设计是什么:
clusters
that a given node
belongs to. node
所属的所有clusters
。 nodes
that belong to a given cluster
. cluster
所有nodes
。 node
belongs to a cluster
, and each cluster
to a node
. node
属于一个cluster
,每个cluster
属于一个node
。 node
or cluster
is deleted or added. node
或cluster
时,请确保一致性。 The number of nodes and clusters will each be in the range of 100,000. 节点和群集的数量将分别在100,000个范围内。
More details of varying relevance: 各种相关性的更多详细信息:
node
will always belong to one or more clusters, node
将始终属于一个或多个集群, cluster
will always contain one or more nodes. cluster
将始终包含一个或多个节点。 cluster
has its only node
removed the cluster should be deleted. cluster
删除了唯一node
,则应删除该群集。 node
will never have all of its clusters removed. node
将永远不会删除其所有集群。 node1
might belong 90% to cluster14
and 10% to cluster88
node1
可能属于cluster14
90%,属于cluster88
10% I was thinking about using SQLite, but the problem is that storing serialized objects in the database is too slow. 我当时在考虑使用SQLite,但问题是在数据库中存储序列化对象太慢。 I could store
object_ids
in the database and then look those up in a dict
that maps object_ids
to object instances, but then there are consistency issues between the dict
and the database. 我可以将
object_ids
存储在数据库中,然后在将object_ids
映射到对象实例的dict
中查找这些对象,但是dict
和数据库之间存在一致性问题。 Additionally fetching a list of instances from the dict
is a bit cumbersome. 另外,从
dict
获取实例列表有点麻烦。
I could possibly store the memory locations of the instances in SQLite but that seems dangerous, and we still have consistancy issues. 我可以在SQLite中存储实例的内存位置,但这似乎很危险,并且我们仍然存在一致性问题。
I implemented a similar data structure on a home project ; 我在家庭项目上实现了类似的数据结构; my own requirements called for a look alike architecture, except i called cluster "tags" (but the core concept is the same).
我自己的要求要求外观类似的体系结构,除了我称群集“标签”(但核心概念相同)。
Here is how you may implement it: 您可以通过以下方式实现它:
Node42
belonds to cluster 1 and 3, the dictionnary will have an entry looking like 5:[Node42, ...]
Node42
分别属于集群1和3,则字典将具有一个类似于5:[Node42, ...]
的条目5:[Node42, ...]
About requirements : 关于要求:
If you are interested in the code I can release it for you to have a look, but I think you first need to make a choice or two regarding architecture : you can't have full Python full constant time memory efficient large scale data structure IMHO. 如果您对代码感兴趣,我可以发布它以供您看一看,但是我认为您首先需要就体系结构做出一两个选择:您无法拥有完整的Python完整的恒定时间内存有效的大规模数据结构IMHO 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.