简体   繁体   English

许多(许多)SQL JOIN与多个查询

[英]Many (many) SQL JOINs vs Multiple queries

I'm here to ask a question that many of you have already ask yourselves, I suppose. 我想是在这里问一个问题,你们中许多人已经问过自己了。 I am creating a PHP website, and everything has been running smoothly until I decided to populate my database with some test data (real data, which when the application starts being used for real, is going to be even bigger). 我正在创建一个PHP网站,直到我决定用一些测试数据(实际数据,当应用程序开始用于实际数据时,实际数据将变得更大)填充所有数据库之后,一切都运行顺利。 Most of the things still work fine, but one particular (and really important) feature started having execution times of three to four seconds, and most of these time is spent in the MySQL server. 大多数事情仍然可以正常运行,但是其中一项特别(非常重要)的功能开始具有三到四秒的执行时间,而这些时间大部分都花在了MySQL服务器上。

Here's the deal: I'm building an application for a school, and it needs to have all the schedules and lessons for every day, every person, every room, every class. 这是一笔交易:我正在为一所学校构建一个应用程序,它需要掌握每一天,每个人,每个房间,每个班级的所有时间表和课程。 The structure of the database is done, the indexes are created, etc... The problem is that since all this data is relational (and can be spread across many tables) one query to get them all might look like this: 数据库的结构已经完成,索引已创建,等等。问题是,由于所有这些数据都是关系数据(并且可以分布在许多表中),因此一次查询就可以获取所有数据:

SELECT field1, field2, etc
FROM schedules AS su
LEFT JOIN schedules_lessons AS sul
    ON sul.ID_SCHEDULE = su.ID
LEFT JOIN schedules_lessons_teachers AS sult
    ON sult.ID_LESSON = sul.ID
LEFT JOIN users AS u
    ON u.ID = sult.ID_TEACHER
LEFT JOIN schedules_periods AS sup
    ON sup.ID_SCHEDULE = su.ID
LEFT JOIN schedules_periods AS sulp
    ON sulp.ID_SCHEDULE = sul.ID_SCHEDULE AND sulp.period = sul.period
LEFT JOIN schools AS s
    ON s.ID = su.ID_SCHOOL
LEFT JOIN schools_buildings AS sb
    ON sb.ID_SCHOOL = s.ID
LEFT JOIN schools_rooms AS sr
    ON sr.ID = sul.ID_ROOM
LEFT JOIN schools_classes AS sc
    ON sc.ID = sul.ID_CLASS

Yeah, that's a lot of joins, I know. 是的,我知道很多。 My question is: how should I get the best balance between the number of joins & the number or queries? 我的问题是:如何在连接数与查询数之间取得最佳平衡? Because I feel like this could be really improved, but I'm not sure how to achieve it. 因为我觉得这真的可以改善,但是我不确定如何实现。

Most of the tables will have the records count under 200, only the lessons table can have lots more. 大多数表的记录数少于200,只有教训表可以有更多记录。 The minimum is something near 5k, and the maximum can be something like 30k, or more. 最小值约为5k,最大值可以约为30k,甚至更高。

If you need this information and the tables are properly indexed, then your join query should be a very reasonable way to extract the data. 如果您需要此信息并且表已正确索引,则联接查询应该是提取数据的非常合理的方法。 You can check to see if the indexes are being used by adding explain before the query. 您可以通过在查询之前添加explain来检查索引是否正在使用。

When you say "most of [the] time is spent in MySQL server", are you taking into account that returning thousands of rows takes time? 当您说“大部分时间都花在MySQL服务器上”时,您是否考虑到返回数千行需要时间? You might try doing the same query, but replacing the select . . . 您可以尝试执行相同的查询,但要替换select . . . select . . . with select count(*) to see what the underlying query performance is. 使用select count(*)来查看底层查询的性能。 Another way would be to add order by <something> limit 1 to the existing query -- the order by has to fully process the query before returning a result. 另一种方法是将order by <something> limit 1到现有查询中- order by必须在返回结果之前完全处理查询。

Finally, if this only started to be a problem, what has changed since it worked the way you want it to? 最后,如果这只是一个问题,那么自从按照您希望的方式工作以来,发生了什么变化?

I'm not a database expert, but maybe it makes sense to only query the information from the database you currently need in your app or web page. 我不是数据库专家,但是仅从您当前在应用程序或网页中需要的数据库中查询信息可能有意义。 This should be possible in a reasonably short time, I guess. 我想这应该会在相当短的时间内完成。 The rest can then be queried from the database when it's actually needed. 然后可以在实际需要时从数据库中查询其余的数据。

Please note that the database server is building one big table in memory where all the joins are merged. 请注意,数据库服务器正在内存中构建一个大表,所有联接都将在此表中合并。 If your server has too less memory, it might have difficulties to build this table. 如果服务器的内存太少,则可能很难建立该表。 (Although that might probably not be the case in your scenario...) (尽管在您的情况下可能并非如此...)

As much as possible you should let the database handle the joins and avoid making more queries than necessary. 您应尽可能让数据库处理联接,并避免进行不必要的查询。 In theory this should be optimal. 从理论上讲,这应该是最佳的。 Your query seems fine provided all the join fields are indexed. 只要所有连接字段都已建立索引,您的查询就可以了。 The stated volumes are nothing spectacular and response times should be fine (once again provided all indexes are created). 所声明的容量并不惊人,并且响应时间应该很好(再次,前提是创建了所有索引)。 Bear in mind that you should rarely if ever have queries that return many records (an exception being reports of course) - in the application you should control this with pagination. 请记住,很少有返回很多记录的查询(当然是报告),这在应用程序中应该使用分页来控制。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM