简体   繁体   中英

SQL LIMIT to get latest records

I am writing a script which will list 25 items of all 12 categories. Database structure is like:

tbl_items
---------------------------------------------
item_id | item_name | item_value | timestamp 
---------------------------------------------

tbl_categories
-----------------------------
cat_id | item_id | timestamp
-----------------------------

There are around 600,000 rows in the table tbl_items . I am using this SQL query:

SELECT e.item_id, e.item_value
  FROM tbl_items AS e
  JOIN tbl_categories AS cat WHERE e.item_id = cat.item_id AND cat.cat_id = 6001
  LIMIT 25

Using the same query in a loop for cat_id from 6000 to 6012. But I want the latest records of every category. If I use something like:

SELECT e.item_id, e.item_value
  FROM tbl_items AS e
  JOIN tbl_categories AS cat WHERE e.item_id = cat.item_id AND cat.cat_id = 6001
  ORDER BY e.timestamp
  LIMIT 25

..the query goes computing for approximately 10 minutes which is not acceptable. Can I use LIMIT more nicely to give the latest 25 records for each category?

Can anyone help me achieve this without ORDER BY ? Any ideas or help will be highly appreciated.

EDIT

tbl_items

+---------------------+--------------+------+-----+---------+-------+
| Field               | Type         | Null | Key | Default | Extra |
+---------------------+--------------+------+-----+---------+-------+
| item_id             | int(11)      | NO   | PRI | 0       |       |
| item_name           | longtext     | YES  |     | NULL    |       |
| item_value          | longtext     | YES  |     | NULL    |       |
| timestamp           | datetime     | YES  |     | NULL    |       |
+---------------------+--------------+------+-----+---------+-------+

tbl_categories

+----------------+------------+------+-----+---------+-------+
| Field          | Type       | Null | Key | Default | Extra |
+----------------+------------+------+-----+---------+-------+
| cat_id         | int(11)    | NO   | PRI | 0       |       |
| item_id        | int(11)    | NO   | PRI | 0       |       |
| timestamp      | datetime   | YES  |     | NULL    |       |
+----------------+------------+------+-----+---------+-------+

Can you add indices? If you add an index on the timestamp and other appropriate columns the ORDER BY won't take 10 minutes.

First of all:

It seems to be a N:M relation between items and categories : a item may be in several categories . I say this because categories has item_id foreign key.

If is not a N:M relationship then you should consider to change design. If it is a 1:N relationship, where a category has several items, then item must constain category_id foreign key.

Working with N:M:

I have rewrite your query to make a inner join insteat a cross join:

  SELECT e.item_id, e.item_value
  FROM 
     tbl_items AS e
  JOIN 
     tbl_categories AS cat 
        on e.item_id = cat.item_id
  WHERE  
     cat.cat_id = 6001
  ORDER BY 
     e.timestamp
  LIMIT 25

To optimize performance required indexes are:

create index idx_1 on tbl_categories( cat_id, item_id)

it is not mandatory an index on items because primary key is also indexed. A index that contains timestamp don't help as mutch. To be sure can try with an index on item with item_id and timestamp to avoid access to table and take values from index:

create index idx_2 on tbl_items( item_id, timestamp)

To increase performace you can change your loop over categories by a single query:

  select T.cat_id, T.item_id, T.item_value from 
  (SELECT cat.cat_id, e.item_id, e.item_value
   FROM 
     tbl_items AS e
   JOIN 
     tbl_categories AS cat 
        on e.item_id = cat.item_id
   ORDER BY 
     e.timestamp
   LIMIT 25
  ) T
  WHERE  
     T.cat_id between 6001 and 6012
  ORDER BY
     T.cat_id, T.item_id

Please, try this querys and come back with your comments to refine it if necessary.

Leaving aside all other factors I can tell you that the main reason why the query is so slow, is because the result involves longtext columns.

BLOB and TEXT fields in MySQL are mostly meant to store complete files, textual or binary. They are stored separately from the row data for InnoDB tables. Each time a query involes sorting (explicitly or for a group by ), MySQL is sure to use disk for the sorting (because it can not be sure in advance how large any file is).

And it is probably a rule of thumb: if you need to return more than a single row of a column in a query, the type of the field is almost never should be TEXT or BLOB , use VARCHAR or VARBINARY instead.

UPD

If you can not update the table, the query will hardly be fast with the current indexes and column types. But, anyway, here is a similar question and a popular solution to your problem: How to SELECT the newest four items per category?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM