[英]SQL or NoSQL search?
Let us suppose I have a site with a certain number of users with the following three distinguishing characteristics: 让我们假设我有一个拥有一定数量用户且具有以下三个鲜明特征的网站:
1) The user is part of a network. 1)用户是网络的一部分。 (The site contains multiple networks.) (该站点包含多个网络。)
2) The user is a 'contact' of a certain number of other site members. 2)用户是一定数量的其他站点成员的“联系人”。
3) Individual documents uploaded by a user may be shared with certain contacts (excluding other contacts). 3)用户上传的个人文档可能会与某些联系人(其他联系人除外)共享。
In this way, a user's document search is unique for each user based upon his or her network, contacts, and additional documents that have been shared with that user. 这样,基于每个用户的网络,联系人以及已与该用户共享的其他文档,用户的文档搜索是唯一的。 What would be possible ways to address this -- would I need to append a long unique SQL query for each user for each of his or her searches? 解决此问题的可能方法是什么?我需要为每个用户的每次搜索附加一个长长的唯一SQL查询吗? I am currently using MySQL as a database -- would using this be sufficient, or would I need to move towards a NoSQL option here to maintain the performance of a similar non-filtered search? 我目前正在将MySQL用作数据库-使用它是否足够,还是我需要在这里转向NoSQL选项以维持类似的非过滤搜索的性能?
A few questions come to mind to help answer this question: 我想到了一些问题来帮助回答这个问题:
Depending on the answer to those two questions, a relational system could work just fine, which I'm guessing is preferable since you are already using MySql. 根据这两个问题的答案,关系系统可以正常工作,我想这是更好的选择,因为您已经在使用MySql。 I think you could locate the documents for an individual user in a relational system with a few very reasonable queries. 我认为您可以通过一些非常合理的查询为关系系统中的单个用户找到文档。
Here is a potential bare bones schema 这是潜在的裸露架构
User
--all users in the system
UserId int
NetworkId int (Not sure if this is a 1 to many relationship)
Document
--all documents in the system
DocumentId int
UserId int -- the author
Name varchar
StatusId -- perhaps a flag to indicate whether it is public or not, e.g. shared with everyone in the same network or shared with all contacts
UserDocumentLink
--Linking between a document and the contacts a user has shared the document with
DocumentId
ContactId
UserContact
--A link between a user and all of their contacts
ContactId -- PK identity to represent a link between two users
UserId -- User who owns the contact
ContactUserId --The contact user
Here is a potential "search" query: 这是一个潜在的“搜索”查询:
--documents owned by me
SELECT DocumentId
from Document where UserId = @userId
UNION
--documents shared with me explicitly
SELECT DocumentId
From UserContact uc
InnerJoin UserDocumentLink ucl on uc.ContactId = ucl.ContactId
Where
uc.ContactUserId = @userId
UNION
--documents shared with me via some public status, using a keyword filter
Select DocumentId
From Document d
inner join User u on d.UserId = u.UserId
where
u.NetworkId = @userNetworkId
and d.status in ()
and d.Name like '%' + @keyword + '%'
I think what might be a more influential requirement for schema design is one that is not mentioned in your question - how will users be able to search through documents? 我认为对模式设计可能更具影响力的要求是您的问题中未提到的要求-用户将如何搜索文档? And what kind of documents are we talking about here? 我们在这里谈论什么样的文件? MySql is not a good option for full text search. MySql不是全文搜索的好选择。
It rather depends on what you mean by a "certain number" of users. 而是取决于您“一定数量”的用户的意思。 If you mean a few tens of thousands, then almost any solution can be made to perform adequately. 如果您的意思是数以万计,那么几乎可以采用任何解决方案来充分发挥作用。 If you mean many millions, then a NoSQL solution may scale up more cheaply and easily. 如果您的意思是数百万,那么NoSQL解决方案可能会更便宜,更轻松地扩展。
I suspect that a more general SQL query can be used, rather than a unique one for each user, eg selecting documents that belong to users that know the current user, that are marked as being shared with the current user, and match the search string. 我怀疑可以使用更通用的SQL查询,而不是为每个用户使用唯一的SQL查询,例如,选择属于知道当前用户的用户的文档,这些文档被标记为与当前用户共享,并且与搜索字符串匹配。
Denormalisation can probably be used (as is common in NoSQL approaches) to improve performance. 可以使用非规范化(在NoSQL方法中很常见)来提高性能。
However, a graph database (as Peter Neubauer suggests) possibly in combination with a document store (CouchDB, MongoDB or Cassandra) would work very well for this type of problem and would scale well. 但是,图形数据库(如Peter Neubauer所建议的)可能与文档存储(CouchDB,MongoDB或Cassandra)结合使用,可以很好地解决此类问题,并且可以很好地扩展。
As others have pointed out the number of users and the frequency of requests (traffic volume) must be looked at. 正如其他人指出的那样,必须考虑用户数量和请求频率(流量)。 Also, how important is redundancy? 另外,冗余有多重要? How likely are people to work on same documents simultaneously? 人们同时处理相同文档的可能性有多大? Are most documents created once and distributed for "readonly" purposes? 大多数文档是否仅创建一次并分发用于“只读”目的?
NoSQL can help you scale and get redundancy in a much easier way compared to rdbms for this particular scenario. 与这种特定情况下的rdbms相比,NoSQL可以以一种更轻松的方式帮助您扩展和获得冗余。 I am assuming that at some point you will want tagging etc. to be enabled on the documents. 我假设您有时希望在文档上启用标记等。
Now, I am wondering if there is any particular reason why you are not looking at off the shelf document management and CMS system for this? 现在,我想知道是否有任何特定原因导致您不为此而使用现成的文档管理和CMS系统? I am sure there is a good reason, but it might be worth looking at all the those options too. 我确信这是有充分理由的,但是也许所有这些选择也值得一看。
I hope this helps. 我希望这有帮助。 Good luck! 祝好运!
I've managed this using long unique queries in MySQL as you suggest for a small-scale social networking project. 正如您为小型社交网络项目所建议的那样,我已经在MySQL中使用长而独特的查询来管理此问题。 Nowadays I would suggest using solr and keeping permission information as a denormalized array of interchangeable keywords on each document. 如今,我建议使用solr并将权限信息保留为每个文档上可互换关键字的非规范化数组。 Say each network has a unique recognizable code (ie 100N-20000N), similar for users and special permission grants. 假设每个网络都有一个唯一的可识别代码(即100N-20000N),类似于用户和特殊权限授予。 You can store an array of permission keys, like "5515N 43243N 2342N 603U 203PG 44321PG" and treat those as keywords when searching. 您可以存储一组许可密钥,例如“ 5515N 43243N 2342N 603U 203PG 44321PG”,并在搜索时将其视为关键字。
I would address it with a simple business process solution, which will lead to a simple data schema, a simple query and so performances and scalabilty: 我将用一个简单的业务流程解决方案来解决它,这将导致一个简单的数据模式,一个简单的查询以及性能和可伸缩性:
sql query to get documents is a simple: select documentid from userdocument where userid=@userid sql查询获取文档很简单:从userdocument中选择documentid,其中userid = @ userid
With a join on document table, proper indexes and sql tuning it will run with all needed informations and it will run fast. 通过对文档表的联接,适当的索引和sql调整,它将与所有需要的信息一起运行,并且运行速度很快。
I hope i understood well what you try to do. 我希望我能很好地理解您的尝试。
-< = one to many
>-< = many to many (will require link table)
Network -< user -< documents >-< contact(user)
v
|
^
contacts(user,user)
This is relational, I don't see a good reason to go NoSQL unless you have a billion users 这是关系性的,除非您拥有十亿用户,否则我认为没有理由使用NoSQL
Network (unless you can belong to more than one) is an attribute of user 网络(除非您可以属于多个网络)是用户的属性
contacts will be maitained in the link table user_contact(user,user) 联系人将在链接表user_contact(user,user)中保留
tables 表
documents(doc_id,user_id)
user(user_id)
contacts(user_id,c_user_id) with foreign keys on users
document_contact(doc_id,c_user_id) where a trigger constrains the c_user_id
then you get a view for all docs owners and subscribers (contacts) 那么您将获得所有文档所有者和订阅者(联系人)的视图
CREATE OR REPLACE VIEW user_docs AS
SELECT d.user_id, d.doc_id, 'owner' AS role
FROM documents d
JOIN users u ON d.user_id = u.user_id
UNION
SELECT c.user_id, d.doc_id, 'subscriber' AS role
FROM documents d
JOIN contacts c ON d.user_id = c.c_user_id;
you can then filter the view against the document contacts, 然后,您可以根据文档联系人过滤视图,
select * from user_docs ud
where
(ud.role = 'originator'
or
ud.doc_id in (select doc_id from document_contact dc where ud.doc_id = dc.doc_id)
) and ud.user_id = 'me'
I would trade off immediateness with performance when it comes to full text searching. 在全文搜索方面,我会权衡即时性和性能。
I would create a hash table of the user combinations with the documents on a separate thread usually triggered by an asynchronous call when user associations change. 我会在单独的线程上创建用户组合和文档的哈希表,通常在用户关联更改时由异步调用触发。
I then query the hash value + other search criteria. 然后,我查询哈希值+其他搜索条件。 This will eliminate the need for the long SQL that appears at the end which may cause a lock. 这将消除对可能导致锁定的末尾长SQL的需要。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.