简体   繁体   中英

A single query with many joins in one API request, or a few queries with some joins in separate API requests?

What is best practice, what delivers the best performance?

I currently have a query with many LEFT JOIN s that fetches a user and all his data, like friends, friend requests, and so on:

SELECT
    `user`.`id` AS `user_id`,
    `user`.`name` AS `user_name`,
    `manager`.`id` AS `manager_id`,
    `competition`.`id` AS `manager_competition_id`,
    `competition`.`name` AS `manager_competition_name`,
    `competition`.`week` AS `manager_competition_week`,
    `country`.`id` AS `manager_competition_country_id`,
    `country`.`name` AS `manager_competition_country_name`,
    `club_template`.`id` AS `manager_club_template_id`,
    `club_template`.`name` AS `manager_club_template_name`,
    `club`.`id` AS `manager_club_id`,
    `club`.`name` AS `manager_club_name`,
    `club`.`ready` AS `manager_club_ready`,
    `friend`.`friend_id` AS `friend_id`,
    `friend_user`.`name` AS `friend_name`
FROM
    `users` AS `user`
LEFT JOIN
    `managers` AS `manager`
ON
    `manager`.`user_id` = `user`.`id`
LEFT JOIN
    `competitions` AS `competition`
ON
    `competition`.`id` = `manager`.`competition_id`
LEFT JOIN
    `countries` AS `country`
ON
    `country`.`id` = `competition`.`country_id`
LEFT JOIN
    `club_templates` AS `club_template`
ON
    `club_template`.`id` = `manager`.`club_template_id`
LEFT JOIN
    `clubs` AS `club`
ON
    `club`.`id` = `manager`.`club_id`
LEFT JOIN
    `friends` AS `friend`
ON
    `friend`.`user_id` = `user`.`id`
LEFT JOIN
    `users` AS `friend_user`
ON
    `friend_user`.`id` = `friend`.`friend_id`
WHERE
    `user`.`id` = 1

As you can see, it's a very big query. My reasoning behind this was that it's better to have just one query that can be done in one API request, like this...

/api/users/1

...versus a few queries, each in their own API request, like this...

/api/users/1
/api/users/1/friends
/api/users/1/friend_requests
/api/users/1/managers

But now I'm worried, that since it's become such a huge query that it will actually hurt performance more than to split it up in separate API requests.

What will scale better?

Update

I've changed the query to the full query. This is not the final query; I plan to add even more joins (or not, depends on the answer).

Each table has a PRIMARY KEY on id . All association columns ( competition_id , club_id , and so on) have a regular INDEX . The database engine is InnoDB.

Of the two, I would recommend the latter: many niche queries. It gives the caller flexibility to pull back just what they want, and is less likely to silently introduce performance problems (eg only one option to retrieve data, so everyone uses it no matter how small a subset of that data they're actually interested in).

That said, it certainly isn't immune from performance problems, it just means the caller may be more aware of them by virtue of issuing so many API calls.

You could provide both though. Make it clear from your naming convention that the expensive version pulls back all data and is for use when the user might otherwise need to make, say, 20 - 30 calls to get the full picture.

Examples:

1 - imagine having to get that full user object just to find out the name. Really wasteful. And if done inadvertently in a big loop, a performance trap waiting to happen. Prefer a getUserName(id) method that just reads that one value back.

2 - on the other hand, if you want to display the user's full profile in a page, then a full getFullUserProfile(id) is most efficient (1 call rather than 10 - 20).

Edit - a further useful example. Anticipate where many values are sought, eg rather than force the caller to run getUserName(id) 500 times to get all names for a certain condition (all admin users perhaps?), provide a List<String> getAdminUserNames() which provides all that data in one call.

Cool question.

I think you need to worry about the domain concepts behind the query and try to stay as loyal to those as you can.

So, taking a wild guess based on your query, you have users in various states of completion - users who have created their profile, but not yet joined a competition; users who have joined a competition but not yet formed a club etc. This reflects your domain model. I would expect your API to reflect this - so using your restful example:

/api/users/profile
/api/users/signedUpUsers/
/api/users/usersWithClubs/

The first invocation (/api/users/profile) allows you to return the user profile, but none of your outer join details other than the user's state (and perhaps URLs where other additional data can be found).

Stick with the domain approach, and build performance test into your development lifecycle; optimize as you go, and only change the design if you can prove you have a problem.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM