简体   繁体   English

SQL数据库列中的数组 - 或 - 一个键/值noSQL存储?

[英]Array in a SQL database column -or- One key/value noSQL store?

I thank you in advance for your time. 我提前感谢你的时间。 Your experience sharing is invaluable. 您的经验分享非常宝贵。

Assume a situation where 4 users can submit posts and upvote others posts. 假设有4个用户可以提交帖子并提升其他帖子的情况。 In a database, which from the above approaches will be the best regarding query execution time? 在数据库中,从上述方法来看,查询执行时间最好吗? Is it the 3 columns SQL database or the 2 columns NoSQL key/value store? 它是3列SQL数据库还是2列NoSQL键/值存储?

An example of querying: get upvoters of post_3. 查询的一个例子:获得post_3的upvoters。

Thank you 谢谢

first approach (MySQL for example) 第一种方法(例如MySQL)

第一种方法

second approach 第二种方法

第二个认可

Edit(based only on John Tseng answer) 编辑(仅基于John Tseng的回答)

create table post (post_id int, author_id int); create table upvote (user_id int, post_id int).

Should I conclude that: 我应该得出这样的结论:

2 tables (=3 reads) is better than long string read from one table processed with php? 2个表(= 3个读取)是否比从php处理的一个表读取长字符串更好?

I want to note that optimizing the speed (or at least learning howto) before implementing the project is something I feel important. 我想要指出,在实施项目之前优化速度(或至少学习如何)是我觉得重要的事情。 I dont want that my tables become TBs in the future and performing slow, because redesign will be risky and need big work. 我不希望我的桌子在未来成为TB并且表现缓慢,因为重新设计将是有风险的并且需要大量工作。 I am scared of moving one big column from a table to another. 我害怕将一个大柱子从一张桌子移到另一张桌子上。 That's why I am spending a lot of time to find the best design for my database and asking from the beginning if is it good to use key/value store. 这就是为什么我花了很多时间为我的数据库找到最好的设计,并从一开始就询问是否使用键/值存储是好的。 Do I missunderstand key-value stores, are they not used at the beginning on the first machine. 我是否会错过理解键值存储,是否在第一台机器上没有使用它们。

The way you store your data is very important, and you should make it logical before optimizing for speed.

Why not to create a logical design which will be easily optimized for speed in the future? 为什么不创建一个可以在未来轻松优化速度的逻辑设计?

From your invaluable answer do I need to understand that good design of SQL database is the only need and use of Key-Value store will be easy after. 从您宝贵的答案中我需要了解的是,SQL数据库的良好设计是唯一需要和使用Key-Value存储后会很容易的。

If you are asking for the query execution time for this one query, then it's definitely the schema where only one read is necessary (first approach). 如果您要求查询这一查询的查询执行时间,那么它肯定是只需要一次读取的模式(第一种方法)。

However, I don't think this is a good way to store your data. 但是,我不认为这是存储数据的好方法。 What if some day, you want to query for: does user 3 upvote any of the posts by user 1? 如果有一天你想要查询:用户3是否支持用户1的任何帖子怎么办?

I would make two tables: 我会做两张桌子:

create table post (post_id int, author_id int); create table post(post_id int,author_id int); create table upvote (user_id int, post_id int); create table upvote(user_id int,post_id int);

This will make your example query slower, but I think you should not be optimizing for speed. 这将使您的示例查询更慢,但我认为您不应该优化速度。 The way you store your data is very important, and you should make it logical before optimizing for speed. 存储数据的方式非常重要,您应该在优化速度之前使其成为逻辑。 You will have a large arsenal of ways to make your database faster without resorting to storing your data in odd ways. 您将拥有大量的方法来提高数据库的速度, 而无需以奇怪的方式存储数据。

Sure when you get to thousands of requests per second, you will want to do some denormalization and NoSQL, but I think you should stay away from those kinds of things in your initial design. 当你每秒达到数千个请求时,你会想要做一些非规范化和NoSQL,但我认为你应该远离初始设计中的那些东西。

When people say NoSQL doesn't scale, they mean databases in the order of TBs and requests in the order of thousands per second. 当人们说NoSQL不能扩展时,它们意味着数据库按TB的顺序排列,请求数量级为每秒数千。 In a few years, they will mean tens of TBs, and ten thousand requests per second. 在几年内,它们将意味着数十TB,每秒一万次请求。 It is already possible to make relational databases handle tens of TBs and ten thousand requests per second with very beefy machines. 已经可以使关系数据库使用非常强大的机器来处理数十TB和每秒一万个请求。 It just will be a little more affordable in a few years. 几年后它会更加实惠。

In conclusion, make your initial design in a relational database, normalizing everything textbook style. 总之,在关系数据库中进行初始设计,规范化所有教科书风格。 When you start to max out on RAM (after other relational database optimizations), then you can think about these issues. 当您开始最大化RAM(在其他关系数据库优化之后),那么您可以考虑这些问题。

EDIT 编辑

For this one query, it is true that reading one row from the table is faster. 对于这一个查询,确实从表中读取一行更快。 However, a good database design should be flexible enough to accomodate multiple queries, not just one. 但是,良好的数据库设计应该足够灵活,以容纳多个查询,而不仅仅是一个。 In your case, once you have thousands of followers or even millions of upvotes, you will have some serious issues. 在你的情况下,一旦你有成千上万的追随者甚至数百万的赞成者,你将遇到一些严重的问题。 To modify an individual upvote, you will need to do a large amount of processing. 要修改单个upvote,您需要进行大量处理。 To find any individual update, you need to parse the whole string. 要查找任何单个更新,您需要解析整个字符串。

While I understand your desire to optimize for speed right at the beginning, many people's experiences have shown that it is much better to have logical code than code that centers purely on performance metrics. 虽然我理解你一开始就想要优化速度的愿望,但很多人的经验表明,拥有逻辑代码比纯粹以性能指标为中心的代码要好得多。 As we have seen in this short example, because you speed up this query, you are dramatically slowing down other queries. 正如我们在这个简短的例子中看到的那样,因为你加快了这个查询的速度,所以你大大减慢了其他查询的速度。

Also, it is much easier to understand SQL structures than NoSQL structures. 此外,理解SQL结构要比NoSQL结构容易得多。 You should definitely learn to do things the SQL way before you learn the NoSQL way. 在学习NoSQL方法之前,你一定要学习SQL方法。 One reason is that NoSQL is anything that's not SQL, and there's a ton of choices. 一个原因是NoSQL 不是 SQL,而且有很多选择。

Back to this particular query. 回到这个特定的查询。 With the right index, the query will issue three successive reads, which will have very similar performance to a single read. 使用正确的索引,查询将发出三次连续读取,这将与单次读取具有非常相似的性能。 I think the right way to speed up this query is not to store the data in your database in an awkward way, but to leave the awkward storage to your caching layer. 我认为加快此查询的正确方法不是以笨拙的方式将数据存储在数据库中,而是将尴尬的存储留给缓存层。 You would query the database for the correct answer to the query, and then store that in your cache. 您将在数据库中查询查询的正确答案,然后将其存储在缓存中。 Every subsequent read would the get the data from the cache. 每次后续读取都将从缓存中获取数据。 The cache would be denormalized like: 缓存将被非规范化,如:

key: Post_3_upvotes value: [1, 3]

The difference between this and storing all your data in a key/value store is that modifications would be done to the relational database which will be good at modifications. 这与将所有数据存储在键/值存储中的区别在于,将对关系数据库进行修改,这将对修改很有帮助。 Modifying the key/value store with many simultaneous users will get you into bigger issues than performance. 使用许多同时用户修改键/值存储将使您遇到比性能更大的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 数据库中存储多个主键; sql vs nosql - Database to store more than one primary key; sql vs nosql 如何在SQL中获取另一列等于值的一列中的值,并存储到PHP数组中? - How to get values in one column where another column equals a value with SQL, and store into an array in PHP? 数据从MySQL同步到NoSQL键值存储 - Data sync from MySQL to NoSQL key-value store 如何在数据库列中存储数组 - How to store an array in a database column 在NoSQL DB中存储一列? - Storing one column in a NoSQL DB? foreach array as key => value ..在每个列名中插入一个值作为数组键全部在一行中 - foreach array as key => value .. insert a value in each column name as array key all in one row 在SQL数据库中存储时间戳值 - Store timestamp value in SQL Database 通过将大型密钥值存储从MySQL迁移到NoSQL DB,我可以期待显着的性能提升吗? - Can I expect a significant performance boost by moving a large key value store from MySQL to a NoSQL DB? SQL数据库 - 为另一个表列获取一列最大值 - SQL Database - Taking one columns max value for another table column 如何将多个图像路径存储在一个数组中以将值存储在数据库中的一行中,以便为php上传多个图像 - how to store multiple image path in one array to store in database the value in a single row for multiple images uploads for php
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM