简体   繁体   English

SQL 中哪个更快:许多许多许多表与一个巨大的表?

[英]Which is faster in SQL: many Many MANY tables vs one huge table?

I am in the process of creating a website where I need to have the activity for a user (similar to your inbox in stackoverflow) stored in sql.我正在创建一个网站,我需要将用户活动(类似于您在 stackoverflow 中的收件箱)存储在 sql 中。 Currently, my teammates and I are arguing over the most effective way to do this;目前,我和我的队友正在争论最有效的方法; so far, we have come up with two alternate ways to do this:到目前为止,我们已经提出了两种替代方法来做到这一点:

  1. Create a new table for each user and have the table name be theirusername_activity.为每个用户创建一个新表,并将表名设为 theirusername_activity。 Then when I need to get their activity (posting, being commented on, etc.) I simply get that table and see the rows in it...然后,当我需要获取他们的活动(发布、被评论等)时,我只需获取该表并查看其中的行...
    • In the end I will have a TON of tables最后我会有很多桌子
    • Possibly Faster可能更快
  2. Have one huge table called activity, with an extra field for their username;有一个名为活动的巨大表,其中有一个额外的字段用于他们的用户名; when I want to get their activity I simply get the rows from that table "...WHERE username=".$loggedInUser当我想获取他们的活动时,我只需从该表中获取行"...WHERE username=".$loggedInUser
    • Less tables, cleaner更少的桌子,更清洁
    • (assuming I index the tables correctly, will this still be slower?) (假设我正确索引表,这还会慢吗?)

Any alternate methods would also be appreciated任何替代方法也将不胜感激

"Create a new table for each user... In the end I will have a TON of tables" “为每个用户创建一个新表......最后我将拥有大量表”

That is never a good way to use relational databases.这绝不是使用关系数据库的好方法。

SQL databases can cope perfectly well with millions of rows (and more), even on commodity hardware. SQL 数据库可以很好地处理数百万行(甚至更多),即使在商用硬件上也是如此。 As you have already mentioned, you will obviously need usable indexes to cover all the possible queries that will be performed on this table.正如您已经提到的,您显然需要可用索引来涵盖将在此表上执行的所有可能查询。

Number 1 is just plain crazy. 1 号简直是疯了。 Can you imagine going to manage it, and seeing all those tables.你能想象去管理它,并看到所有这些表。

Can you imagine the backup.你能想象备份。 Or the dump.或者垃圾场。 That many create tables... that would be crazy.那么多人创建表格……那太疯狂了。

Get you a good index, and you will have no problem sorting through records.给你一个好的索引,你对记录进行排序就没有问题了。

here we talk about MySQL.这里我们谈谈MySQL。 So why would it be faster to make separate tables?那么为什么制作单独的表格会更快呢?

  • query cache efficiency , each insert from one user would'nt empty the query cache for others查询缓存效率,一个用户的每次插入都不会为其他用户清空查询缓存
  • Memory & pagination , used tables would fit in buffers, unsued data would easily not be loaded there Memory & pagination ,使用过的表将适合缓冲区,未使用的数据很容易不会加载到那里

But as everybody here said is semms quite crazy, in term of management.但正如这里的每个人所说,就管理而言,这似乎很疯狂。 But in term of performances having a lot of tables will add another problem in mySQL, you'll maybe run our of file descriptors or simply wipe out your table cache .但就性能而言,拥有大量表会在 mySQL 中增加另一个问题,您可能会运行我们的文件描述符或简单地清除表缓存

It may be more important here to choose the right engine, like MyIsam instead of Innodb as this is an insert-only table.在这里选择正确的引擎可能更重要,例如MyIsam而不是 Innodb,因为这是一个仅插入表。 And as @RC said a good partitionning policy would fix the memory & pagination problem by avoiding the load of rarely used data in active memory buffers.正如@RC 所说,良好的分区策略将通过避免在活动 memory 缓冲区中加载很少使用的数据来解决 memory 和分页问题。 This should be done with an intelligent application design as well, where you avoid the load of all the activity history by default, if you reduce it to recent activity and restrict the complete history table parsing to batch processes and advanced screens you'll get a nice effect with the partitionning.这也应该通过智能应用程序设计来完成,默认情况下您可以避免加载所有活动历史记录,如果您将其减少到最近的活动并将完整的历史记录表解析限制为批处理和高级屏幕,您将获得分区效果很好。 You can even try a user-based partitioning policy.您甚至可以尝试基于用户的分区策略。

For the query cache efficiency, you'll have a bigger gain by using an application level cache (like memcache) with history-per-user elements saved there and by emptying it at each new insert.对于查询缓存效率,通过使用应用程序级缓存(如 memcache)并在其中保存每个用户的历史元素并在每次新插入时清空它,您将获得更大的收益。

In some cases, the first option is, in spite of not being strictly "the relational way", slightly better, because it makes it simpler to shard your database across multiple servers as you grow.在某些情况下,第一个选项虽然不是严格的“关系方式”,但稍微好一点,因为随着您的增长,它可以更简单地将数据库分片到多个服务器上。 (Doing this is precisely what allows wordpress.com to scale to millions of blogs.) (这样做正是允许 wordpress.com 扩展到数百万博客的原因。)

The key is to only do this with tables that are entirely independent from a user to the next -- ie never queried together.关键是只对完全独立于用户的表执行此操作 - 即从不一起查询。

In your case, option 2 makes the most case: you'll almost certainly want to query the activity across all or some users at some point.在您的情况下,选项 2 是最适合的情况:您几乎肯定会在某个时候查询所有或部分用户的活动。

You want the second option, and you add the userId (and possibly a seperate table for userid, username etc etc).您需要第二个选项,然后添加 userId(可能还有一个用于 userid、用户名等的单独表)。

If you do a lookup on that id on an properly indexed field you'd only need something like log(n) steps to find your rows.如果您在正确索引的字段上对该 id 进行查找,您只需要类似log(n)步骤来查找您的行。 This is hardly anything at all.这根本算不上什么。 It will be way faster, way clearer and way better then option 1. option 1 is just silly.它将比选项 1 更快、更清晰、更好。选项 1 只是愚蠢的。

Use option 2, and not only index the username column, but partition (consider a hash partition) on that column as well.使用选项 2,不仅要索引用户名列,还要在该列上进行分区(考虑 hash 分区)。 Partitioning on username will provide you some of the same benefits as the first option and allow you to keep your sanity.对用户名进行分区将为您提供与第一个选项相同的一些好处,并让您保持理智。 Partitioning and indexing the column this way will provide a very fast and efficient means of accessing data based on the username/user_key.以这种方式对列进行分区和索引将提供一种基于用户名/user_key 访问数据的非常快速和有效的方法。 When querying a partitioned table, the SQL Engine can immediately lop off partitions it doesn't need to scan as it can tell based off of the username value queried vs. the ability of that username to reside within a partition.查询分区表时,SQL 引擎可以立即删除不需要扫描的分区,因为它可以根据查询的用户名值与该用户名驻留在分区中的能力来判断。 (in this case only one partition could contain records tied to that user) If you have a need to shard the table across multiple servers in the future, partitioning doesn't hinder that ability. (在这种情况下,只有一个分区可以包含与该用户相关的记录)如果您将来需要跨多台服务器对表进行分片,分区不会妨碍这种能力。

You will also want to normalize the table by separating the username field (and any other elements in the table related to username) into its own table with a user_key.您还需要通过使用 user_key 将用户名字段(以及表中与用户名相关的任何其他元素)分隔到自己的表中来规范化表。 Ensure a primary key on the user_key field in the username table.确保用户名表中 user_key 字段的主键。

This majorly depends now on where you need to retrieve the values.现在这主要取决于您需要在哪里检索值。 If its a page for single user, then use first approach.如果它是单个用户的页面,则使用第一种方法。 If you are showing data of all users, you should use single table.如果要显示所有用户的数据,则应使用单表。 Using multiple table approach is also clean but in sql if the number of records in a single table are very high, the data retrieval is very slow使用多表方法也很干净,但是在 sql 中,如果单表中的记录数非常多,则数据检索非常慢

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM