简体   繁体   中英

Table with over 12 million rows running in performance problems

Now this table that is having problems is actually is a relationship table for keyword analysis of websites where there are five columns ( keyword_id , website_id , occurrence , percentage , date )

This allows for the keyword statistics for a website over a period of time and allows for a visual graph representation to the website owner.

Now the problem is that we index about 57 unique keywords per website on average. And we index about 12000 websites everyday and this is because we are already running into performance problems. So you get the picture that this table size is growing very fast.

Now I have an index on keyword_id , website id , occurrence , percentage and date ). So each one of them has an index, but I am still having problems with selects.

How would you solve this performance problem on mysql with PHP?

NOTE: The indexes are for each field and 1 for all of them combined as well.

SQL QUERY 1: SELECT * FROM table WHERE keyword_id = "323242"
SQL QUERY 2: SELECT * FROM table WHERE website_id = "232"
SQL QUERY 3: SELECT * FROM table WHERE keyword_id = "323242" ORDER by percentage
SQL QUERY 4: SELECT * FROM table WHERE website_id = "232" ORDER by occurence
SQL QUERY 5: SELECT * FROM table WHERE keyword_id = "323242" ORDER by occurrence
SQL QUERY 6: SELECT * FROM table WHERE website_id = "232" ORDER BY date

What's the distribution and probability of the keywords? For example, if you had a keyword used by every site, every day, after 6 mos, that's 2.1M rows for a single key word. I'm sure that's not the case, but popular words are going get large quickly.

The website_id one shouldn't be too bad, only a few thousand rows.

If you're only doing queries by keyword_id and website_id, the other indexes are costing you time and space (but not on read).

Ideally, an index on keyword_id, percentage would let the optimizer return a rather quick result for your keyword_id sorted by percentage query, similarly for the others, but that can depend a lot of the layout of the data.

How much memory is on the box and how fast are the drives? I would look at the IO Ops per sec when you're doing these queries. You can easily be just thrashing your drives.

With a decent amount of memory, the order by clauses should be pretty cheap, and likely cheaper to sort them rather than doing lots of random reads from disk, but that's up to the index and how it's organized in relation to the pages on the disk.

Also, make sure you have all of your statistics up to date. Bad statistics will murder you queries.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM