简体   繁体   中英

MySQL OUTER LEFT JOIN performance

I am updating an existing web-based inventory system that pulls data from a MySQL database. The main structures for the data stored are "items" and "tags" with a one-to-many relationship (items can have multiple corresponding tags)

The existing front-end system for the data is a Backbone.js app that pulls the entire datastore on login and manipulates that data in-memory, committing back to the database when necessary via a RESTful interface. (This is not how I would have designed the system, but it is now a common pattern in Backbone and Spine apps, and how most all of the tutorials and books teach these frameworks).

To serve the initial fetch performed by the front-end in which it captures the entire dataset (about 1000 items and 10,000 item tags at this point) the back-end performs a SELECT query for the items table, and then subsequent SELECT queries for tags table for each item fetched. Performance sucks, obviously. I thought this could be improved with an JOIN, figuring one select query is better than 1000. The following query fetches the data I need but takes over 15s to execute even on my local development server. What gives? Can we improve this system or query without setting up additional infrastructure like a caching key-value store?

SELECT items.*, itemtags.id as `tag_id`, itemtags.tag, itemtags.type
FROM items LEFT OUTER JOIN
     itemtags
     ON items.id = itemtags.item_id
ORDER BY items.id;

Here are the table structures:

CREATE TABLE `items` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `num` int(11) NOT NULL,
  `title` varchar(100) NOT NULL,
  `length_inches` int(10) unsigned DEFAULT NULL,
  `length_feet` int(10) unsigned DEFAULT NULL,
  `width_inches` int(10) unsigned DEFAULT NULL,
  `width_feet` int(10) unsigned DEFAULT NULL,
  `height_inches` int(10) unsigned DEFAULT NULL,
  `height_feet` int(10) unsigned DEFAULT NULL,
  `depth_inches` int(10) unsigned DEFAULT NULL,
  `depth_feet` int(10) unsigned DEFAULT NULL,
  `retail_price` int(10) unsigned DEFAULT NULL,
  `discount` int(10) unsigned DEFAULT NULL,
  `decorator_price` int(10) unsigned DEFAULT NULL,
  `new_price` int(10) unsigned DEFAULT NULL,
  `sold` int(10) unsigned NOT NULL,
  `push_date` int(10) unsigned DEFAULT NULL,
  `updated` int(10) unsigned NOT NULL,
  `created` int(10) unsigned NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=1747 DEFAULT CHARSET=latin1;

CREATE TABLE `itemtags` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `item_id` int(10) unsigned NOT NULL,
  `tag` varchar(100) NOT NULL,
  `type` varchar(100) NOT NULL,
  `created` int(10) unsigned NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=61474 DEFAULT CHARSET=latin1;

In terms of performance, you are probably not comparing like-to-like.

The SQL query is completely doing the following things:

  • Joining the two tables together
  • Sorting the results by items.id
  • Returning all the results

Is the original version doing all three of these and waiting until they are completed?

My guess is that the original code is pulling the items back in the order you want them, and then only pulling the tags for a handful that are actually needed at any given time.

In addition, it is unclear how large the items.* data is. The way the query is formulated, you are pulling this about 10 times for each item -- potentially a much larger return set than the original data.

The real question is why you need all this information in the memory of the application. You have the database, just pull back what you need when you need it. Are you familiar with limit and offset -- these may be what you are really looking for.

I think you could use this:

SELECT *, a.id as `tag_id`, a.tag, a.type
FROM items LEFT OUTER JOIN
     (SELECT id, item_id, tag, type from itemtags ORDER BY 1,2,3) a
     ON items.id = a.item_id
ORDER BY items.id;

I didn't really change much, just the alias. a doesn't signify anything important.

I didn't fill the tables but your original query took 4ms, mine took 1ms.

http://sqlfiddle.com/#!2/b9551/6

Your application can pull the entire data store, irregardless of what you have in your data-set. As data store and data set are not synonymous.

You don't have any indexes either. You should put an index on ID, ITEM_ID in order to optimize the table to return results quicker. I created an index in my sub-query with the order by . Hope this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM