保存Web应用程序的动态图结构

Question

I have been searching for an answer to this question for a while, however haven't found a satisfying solution yet. 我一直在寻找这个问题的答案，但是还没有找到令人满意的解决方案。

I am trying to store a dynamic undirected graph structure for a web application. 我正在尝试为Web应用程序存储动态无向图结构。 It should store user "subscriptions" between each other, which can change quite frequently. 它应该在彼此之间存储用户“订阅”，这可能会经常更改。

A traditional database solution makes no sense. 传统的数据库解决方案没有任何意义。 JSON files for each user doesn't seem like the best solution either, for the same reason why the database solution isn't ideal. 出于同样的原因，数据库解决方案也不理想，因此每个用户的JSON文件也不是最佳解决方案。

Any other ideas for an optimal solution for my problem? 还有其他想法可以为我的问题提供最佳解决方案吗？

Thanks in advance! 提前致谢！

Answer 1

The type of structure you are talking about makes most sense in a relational database (which is what I assume you meant by "traditional"). 您所讨论的结构类型在关系数据库中最有意义（这是我认为您所说的“传统”的意思）。 Because you have the subscriptions between users, this is a relationship, thus, a relational database makes most sense. 因为您在用户之间具有订阅，所以这是一种关系，因此关系数据库最有意义。 Relational databases allow explicit connections between different tables. 关系数据库允许不同表之间的显式连接。

A document database (ie one that holds JSON documents) is a very bad idea for this type of data . 对于此类数据 ，文档数据库（即保存JSON文档的数据库）是一个非常糟糕的主意 。 Document databases can be very good at certain things, but data that is highly interdependent (like some sort of subscription system) is a poor use of a document database. 文档数据库在某些方面可能非常擅长，但是高度相互依赖的数据（如某种订阅系统）对文档数据库的使用很差。 I'll explain more as we go along. 我们将继续解释。

You say that your graph edges are undirected, but that the fact that you call them "subscriptions" tells me that they are actually directed: one user subscribes to another. 您说您的图边缘是无向的，但是您称它们为“订阅”这一事实告诉我它们实际上是有向的：一个用户订阅了另一个用户。 If it was undirected, it would be more like friending someone on facebook or connecting on LinkedIn: if I am friends with you, then you must be friends with me. 如果它是无向的，则更像是在Facebook上与某人成为朋友或在LinkedIn上建立联系：如果我与您成为朋友，那么您必须与我成为朋友。 In a subscription system (say like Google+ or Twitter), even if I am subscribed to you, you don't necessarily need to be subscribed to me. 在订阅系统（例如Google+或Twitter）中，即使我订阅了您，也不一定需要订阅我。 If we are both subscribed to each other, then it is actually two directed edges: one to you from me, and one to me from you. 如果我们彼此都订阅，那么实际上是两个方向：一个是我给你的，另一个是你给我的。

Thus, the best solution is to have at least two tables: a primary " users " table, and a secondary " subscriptions " table. 因此，最好的解决方案是至少具有两个表：主“ users ”表和辅助“ subscriptions ”表。 The " users " table would have columns like uid , name , email and so forth. “ users ”表将具有uid ， name ， email等列。 The " subscriptions " table has only two columns: subscriber and subscription . “ subscriptions ”表只有两列： subscriber和subscription 。 Both hold a uid value from the " users " table and every pair of values must be unique in the table. 两者都包含“ users ”表中的uid值，并且每对值在表中必须唯一。

You asked if this would "bloat up" with so many subscriptions. 您问这样的订阅是否会“膨胀”呢？ First off, you are assuming you are going to be the next facebook and will need to deal with millions or billions of users. 首先，假设您将成为下一个Facebook，并且需要与数百万或数十亿的用户打交道。 Don't worry, you won't have that problem, at least at first. 别担心，至少最初不会有这个问题。 Second, most relational databases are logarithmic in their performance for retrieving and inserting records, which scales very nicely as your numbers of users increases. 其次，大多数关系数据库在检索和插入记录的性能上都是对数的，随着用户数量的增加，它的扩展性非常好。 For the type of behavior you are expecting from a document database or JSON files on disk, your behavior will either be of linear time complexity , since you will need to iterate over every single document in the database to be sure you have checked for all subscriptions (linear behavior scales far worse than logarithmic), or you will need to duplicate subscriber/subscription information in all records. 对于您希望从文档数据库或磁盘上的JSON文件获得的行为类型，您的行为将具有线性时间复杂度，因为您将需要遍历数据库中的每个文档以确保已检查所有预订（线性行为的标度远比对数差），否则您将需要在所有记录中重复订阅者/订阅信息。 The second solution would indeed would be bloated since you are duplicating tons of data and, more importantly, it runs the huge risk of getting out of sync quite easily. 第二种解决方案的确会be肿，因为您正在复制大量数据，而且更重要的是，它具有很大的风险，即很容易不同步。 It is far easier to get out of sync in this scenario than you might think. 在这种情况下，不同步比您想象的容易得多。

To show you how to do this, I will use the sqlite3 dialect of SQL . 为了向您展示如何执行此操作，我将使用SQL的sqlite3方言。 It is what I prototype in most, so I am most familiar with it. 这是我大多数情况下的原型，所以我对此最为熟悉。 Converting it to something like MySQL or PostgreSQL should be fairly trivial. 将其转换为MySQL或PostgreSQL之类的东西应该是微不足道的。 Here is the statements to make the databases: 这是制作数据库的语句：

# since `uid` is the primary key, just pass it a
# null value on insertion and the database will 
# generate a unique integer and use that automatically.
# it might also be good to make more than just the uid unique,
# such as their email.
CREATE TABLE users (uid INTEGER PRIMARY KEY,
                    name TEXT,
                    email TEXT);

# we will use the uid for the foreign key reference since this should
# never change, even if the user changes their name or email.
CREATE TABLE subrs (subscriber INTEGER,
                    subscription INTEGER,

                    # make sure each entry of pairs is unique
                    CONSTRAINT uc_edges UNIQUE (subscriber,subscription),

                    # be sure subscribers can only be created for users that exist
                    CONSTRAINT fk_subr FOREIGN KEY (subscriber) REFERENCES users(uid),

                    # be sure subscription can only be created for users that exist
                    CONSTRAINT fk_subee FOREIGN KEY (subscription) REFERENCES users(uid)
                   );

This usually has the nice added benefit that you can't delete users who have subscriptions to them until you first delete those subscriptions. 这通常具有一个很好的附加好处，即您必须先删除这些订阅，才能删除已订阅他们的用户。 Depending on the database you choose, YMMV, so check the docs for your DB of choice. 根据您选择的数据库YMMV，因此请检查您选择的数据库的文档。 Almost all SQL databases also support the behavior with foreign keys that you can not create a record with a value for a foreign key that does not yet exist. 几乎所有的SQL数据库都支持外键的行为，即您无法使用尚不存在的外键值创建记录。 With JSON files or a document database, it is easy to leave dangling subscriptions or to have user deletion take a long time since you need to modify every user document that references a given user. 使用JSON文件或文档数据库，由于需要修改引用给定用户的每个用户文档，因此很容易留下闲置的订阅或使用户删除花费很长时间。 A relational SQL database can simplify a lot of things that would otherwise be done in your code. 关系型SQL数据库可以简化很多事情，而这些事情原本可以在代码中完成。 Handling this logic in your application code introduces many more opportunities for mistakes and bugs with your data handling. 在应用程序代码中处理此逻辑会为数据处理带来更多错误和错误的机会。 A bit of advice: what ever work you can offload onto your database, you should offload onto your database. 一点建议：无论什么工作都可以卸载到数据库上，您应该卸载到数据库上。 Professional databases are far better tested than your code and already have the logic for many of the common sorts of actions you could want to do to data. 专业的数据库比代码要经过更好的测试，并且已经具有可以对数据执行的许多常见操作的逻辑。

To find a user's subscriptions, you would do a query something like the following: 要查找用户的订阅，您可以执行以下查询：

SELECT * FROM subrs WHERE subscriber=some_uid;

To get all the subscribers to a given user, the query is similarly simple: 为了使所有订户都拥有一个给定的用户，查询同样简单：

SELECT * FROM subrs WHERE subscription=some_uid;

To delete a user record would only be about three lines long: 删除用户记录大约只有三行：

DELETE FROM subrs WHERE subscription=some_uid;
DELETE FROM subrs WHERE subscriber=some_uid;
DELETE FROM users WHERE uid=some_uid;

In a document database you would have far more application code to do something pretty similar and you run the risk of your application code having poor logic and breaking your data. 在文档数据库中，您将拥有更多的应用程序代码来执行类似的操作，并且冒着应用程序代码逻辑不佳并破坏数据的风险。

TL;DR TL; DR

Use a relational SQL database. 使用关系SQL数据库。 You can create explicit relationships between records. 您可以在记录之间创建显式关系。 Thus, it won't be as easy to shoot yourself in the foot as it would be with a document database (since all relationships are merely implied). 因此，像使用文档数据库那样，用脚射击自己并不容易（因为仅隐含了所有关系）。 SQL databases like MySQL also tend to scale better, both vertically (ie with more users records) and horizontally (ie with more replica servers). 像MySQL这样的SQL数据库在纵向（即具有更多用户记录）和横向（即具有更多副本服务器）上也倾向于更好地扩展。

保存Web应用程序的动态图结构

问题描述

1 个解决方案

解决方案1
2 已采纳 2015-12-21 22:36:12

保存Web应用程序的动态图结构

问题描述

1 个解决方案

解决方案1 2 已采纳 2015-12-21 22:36:12

解决方案1
2 已采纳 2015-12-21 22:36:12