简体   繁体   English

SQL还是NoSQL搜索?

[英]SQL or NoSQL search?

Let us suppose I have a site with a certain number of users with the following three distinguishing characteristics: 让我们假设我有一个拥有一定数量用户且具有以下三个鲜明特征的网站:

1) The user is part of a network. 1)用户是网络的一部分。 (The site contains multiple networks.) (该站点包含多个网络。)

2) The user is a 'contact' of a certain number of other site members. 2)用户是一定数量的其他站点成员的“联系人”。

3) Individual documents uploaded by a user may be shared with certain contacts (excluding other contacts). 3)用户上传的个人文档可能会与某些联系人(其他联系人除外)共享。

In this way, a user's document search is unique for each user based upon his or her network, contacts, and additional documents that have been shared with that user. 这样,基于每个用户的网络,联系人以及已与该用户共享的其他文档,用户的文档搜索是唯一的。 What would be possible ways to address this -- would I need to append a long unique SQL query for each user for each of his or her searches? 解决此问题的可能方法是什么?我需要为每个用户的每次搜索附加一个长长的唯一SQL查询吗? I am currently using MySQL as a database -- would using this be sufficient, or would I need to move towards a NoSQL option here to maintain the performance of a similar non-filtered search? 我目前正在将MySQL用作数据库-使用它是否足够,还是我需要在这里转向NoSQL选项以维持类似的非过滤搜索的性能?

A few questions come to mind to help answer this question: 我想到了一些问题来帮助回答这个问题:

  1. How many documents do you think the average user will have access to? 您认为普通用户可以访问多少个文件? Will many documents in the network be shared for all to see? 网络中的许多文档将共享给所有人看吗?
  2. How will users be able to find documents and what do the documents look like? 用户将如何找到文档以及文档的外观如何? Will they only be able to search by the contact that shared it? 他们将只能通过共享它的联系人进行搜索吗? By a simple title match? 通过简单的标题匹配? Will they be able to run a full text search against the document's contents? 他们将能够对文档内容进行全文搜索吗?

Depending on the answer to those two questions, a relational system could work just fine, which I'm guessing is preferable since you are already using MySql. 根据这两个问题的答案,关系系统可以正常工作,我想这是更好的选择,因为您已经在使用MySql。 I think you could locate the documents for an individual user in a relational system with a few very reasonable queries. 我认为您可以通过一些非常合理的查询为关系系统中的单个用户找到文档。

Here is a potential bare bones schema 这是潜在的裸露架构

User
--all users in the system
UserId int
NetworkId int (Not sure if this is a 1 to many relationship)

Document
--all documents in the system
DocumentId int
UserId int -- the author
Name varchar 
StatusId -- perhaps a flag to indicate whether it is public or not, e.g. shared with everyone in the same network or shared with all contacts

UserDocumentLink
--Linking between a document and the contacts a user has shared the document with
DocumentId
ContactId

UserContact
--A link between a user and all of their contacts
ContactId -- PK identity to represent a link between two users
UserId -- User who owns the contact
ContactUserId --The contact user

Here is a potential "search" query: 这是一个潜在的“搜索”查询:

--documents owned by me
SELECT DocumentId
from Document where UserId = @userId

UNION

--documents shared with me explicitly
SELECT DocumentId
From UserContact uc
InnerJoin UserDocumentLink ucl on uc.ContactId = ucl.ContactId
Where 
uc.ContactUserId = @userId

UNION

--documents shared with me via some public status, using a keyword filter
Select DocumentId
From Document d 
inner join User u on d.UserId = u.UserId
where 
u.NetworkId = @userNetworkId
and d.status in ()
and d.Name like '%' + @keyword + '%'

I think what might be a more influential requirement for schema design is one that is not mentioned in your question - how will users be able to search through documents? 我认为对模式设计可能更具影响力的要求是您的问题中未提到的要求-用户将如何搜索文档? And what kind of documents are we talking about here? 我们在这里谈论什么样的文件? MySql is not a good option for full text search. MySql不是全文搜索的好选择。

It rather depends on what you mean by a "certain number" of users. 而是取决于您“一定数量”的用户的意思。 If you mean a few tens of thousands, then almost any solution can be made to perform adequately. 如果您的意思是数以万计,那么几乎可以采用任何解决方案来充分发挥作用。 If you mean many millions, then a NoSQL solution may scale up more cheaply and easily. 如果您的意思是数百万,那么NoSQL解决方案可能会更便宜,更轻松地扩展。

I suspect that a more general SQL query can be used, rather than a unique one for each user, eg selecting documents that belong to users that know the current user, that are marked as being shared with the current user, and match the search string. 我怀疑可以使用更通用的SQL查询,而不是为每个用户使用唯一的SQL查询,例如,选择属于知道当前用户的用户的文档,这些文档被标记为与当前用户共享,并且与搜索字符串匹配。

Denormalisation can probably be used (as is common in NoSQL approaches) to improve performance. 可以使用非规范化(在NoSQL方法中很常见)来提高性能。

However, a graph database (as Peter Neubauer suggests) possibly in combination with a document store (CouchDB, MongoDB or Cassandra) would work very well for this type of problem and would scale well. 但是,图形数据库(如Peter Neubauer所建议的)可能与文档存储(CouchDB,MongoDB或Cassandra)结合使用,可以很好地解决此类问题,并且可以很好地扩展。

I would take a look at some of the NOSQL solutions, for this interconnected dataset possibly Neo4j , a Graph Database. 我将看一下一些NOSQL解决方案,用于此互连的数据集,可能是Neo4j (图形数据库)。 It's even pretty straightforward to query it through Cypher so that you get tabular results back. 通过Cypher进行查询甚至非常简单,因此您可以返回表格结果。

As others have pointed out the number of users and the frequency of requests (traffic volume) must be looked at. 正如其他人指出的那样,必须考虑用户数量和请求频率(流量)。 Also, how important is redundancy? 另外,冗余有多重要? How likely are people to work on same documents simultaneously? 人们同时处理相同文档的可能性有多大? Are most documents created once and distributed for "readonly" purposes? 大多数文档是否仅创建一次并分发用于“只读”目的?

NoSQL can help you scale and get redundancy in a much easier way compared to rdbms for this particular scenario. 与这种特定情况下的rdbms相比,NoSQL可以以一种更轻松的方式帮助您扩展和获得冗余。 I am assuming that at some point you will want tagging etc. to be enabled on the documents. 我假设您有时希望在文档上启用标记等。

Now, I am wondering if there is any particular reason why you are not looking at off the shelf document management and CMS system for this? 现在,我想知道是否有任何特定原因导致您不为此而使用现成的文档管理和CMS系统? I am sure there is a good reason, but it might be worth looking at all the those options too. 我确信这是有充分理由的,但是也许所有这些选择也值得一看。

I hope this helps. 我希望这有帮助。 Good luck! 祝好运!

  • Denormalization will give you better read-search performance in this case. 在这种情况下,非规范化将为您提供更好的读取搜索性能。
  • Don't normalize users, keep frequently joined entities like owner and text, in one table 不要规范用户,将频繁加入的实体(如所有者和文本)保留在一个表中
  • eg keep names of the owners as FK on text table, to keep their names on the text table and decrease number of joins, then you can use sql freely. 例如,在文本表上将所有者的名称保留为FK,在文本表上保留其名称并减少连接数,则可以自由使用sql。

I've managed this using long unique queries in MySQL as you suggest for a small-scale social networking project. 正如您为小型社交网络项目所建议的那样,我已经在MySQL中使用长而独特的查询来管理此问题。 Nowadays I would suggest using solr and keeping permission information as a denormalized array of interchangeable keywords on each document. 如今,我建议使用solr并将权限信息保留为每个文档上可互换关键字的非规范化数组。 Say each network has a unique recognizable code (ie 100N-20000N), similar for users and special permission grants. 假设每个网络都有一个唯一的可识别代码(即100N-20000N),类似于用户和特殊权限授予。 You can store an array of permission keys, like "5515N 43243N 2342N 603U 203PG 44321PG" and treat those as keywords when searching. 您可以存储一组许可密钥,例如“ 5515N 43243N 2342N 603U 203PG 44321PG”,并在搜索时将其视为关键字。

I would address it with a simple business process solution, which will lead to a simple data schema, a simple query and so performances and scalabilty: 我将用一个简单的业务流程解决方案来解决它,这将导致一个简单的数据模式,一个简单的查询以及性能和可伸缩性:

  • Each User has a list of documents... Period. 每个用户都有一个文档列表...期限。
  • This list is in fact a list of references to documents in a document table (with owner/security informations...) 实际上,此列表是对文档表中文档的引用的列表(带有所有者/安全信息...)
  • When sharing a document to another user this document reference is added to the user's document list (Tagged as a shared one if you want), user is added to the document security list (with permission level for example). 与其他用户共享文档时,此文档引用会添加到用户的文档列表中(如果需要,请标记为共享的文档),然后将用户添加到文档安全列表中(例如,权限级别)。

sql query to get documents is a simple: select documentid from userdocument where userid=@userid sql查询获取文档很简单:从userdocument中选择documentid,其中userid = @ userid

With a join on document table, proper indexes and sql tuning it will run with all needed informations and it will run fast. 通过对文档表的联接,适当的索引和sql调整,它将与所有需要的信息一起运行,并且运行速度很快。

I hope i understood well what you try to do. 我希望我能很好地理解您的尝试。

-<  = one to many
>-< = many to many (will require link table)
Network -< user -< documents >-< contact(user)
            v
            |
            ^
      contacts(user,user)

This is relational, I don't see a good reason to go NoSQL unless you have a billion users 这是关系性的,除非您拥有十亿用户,否则我认为没有理由使用NoSQL

Network (unless you can belong to more than one) is an attribute of user 网络(除非您可以属于多个网络)是用户的属性

contacts will be maitained in the link table user_contact(user,user) 联系人将在链接表user_contact(user,user)中保留

tables

documents(doc_id,user_id)
user(user_id)
contacts(user_id,c_user_id) with foreign keys on users
document_contact(doc_id,c_user_id) where a trigger constrains the c_user_id

then you get a view for all docs owners and subscribers (contacts) 那么您将获得所有文档所有者和订阅者(​​联系人)的视图

CREATE OR REPLACE VIEW user_docs AS 
     SELECT d.user_id, d.doc_id, 'owner' AS role
       FROM documents d
     JOIN users u ON d.user_id = u.user_id
UNION 
     SELECT c.user_id, d.doc_id, 'subscriber' AS role
       FROM documents d
     JOIN contacts c ON d.user_id = c.c_user_id;

you can then filter the view against the document contacts, 然后,您可以根据文档联系人过滤视图,

select * from user_docs ud 
where 
(ud.role = 'originator'
or
ud.doc_id in (select doc_id from document_contact dc where ud.doc_id = dc.doc_id)
) and ud.user_id = 'me'

I would trade off immediateness with performance when it comes to full text searching. 在全文搜索方面,我会权衡即时性和性能。

I would create a hash table of the user combinations with the documents on a separate thread usually triggered by an asynchronous call when user associations change. 我会在单独的线程上创建用户组合和文档的哈希表,通常在用户关联更改时由异步调用触发。

I then query the hash value + other search criteria. 然后,我查询哈希值+其他搜索条件。 This will eliminate the need for the long SQL that appears at the end which may cause a lock. 这将消除对可能导致锁定的末尾长SQL的需要。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM