I have a table in our database with more than 20 millions records. each day that table grows by an average of 100.000 records. I need to perform a count on that table, at most I need to scan only 24hours of records (avg 100k) My general approach:
SELECT MAX(acc.id) FROM MyTable as acc WHERE 1
) as this is very fast.Long tolerableMin = maxId - 100000
.SELECT count(*) FROM MyTable as acc " + " WHERE acc.X = 'SomeValue' + " AND acc.Y = 'OtherVal' + " AND acc.id > " + tolerableMin + " ORDER BY id DESC
. This average execution time is ~2 seconds. When I do a straight count(x) with the where clause but no condition on the acc.id > X
the query hangs for more than 15 seconds. My question is:
NB: I'm using this in Java/Hibernate backend and MySQL as a db server.
This is a perfect scenario for not having to count at all
I would rather create a trigger to fill up a table having the counter separately, if you don't like triggers, consider having the table and a job to fill it up from times to times in the background.
There are really few cases in real life where you really need real time data, 30 minutes to some hours are quite fine sometimes to update such this counter
Another brainstorming idea would be to have a solr or any other NoSQL, to index all this data that you need to count, in a nosql kind of storage, then the counter would be much quicker
I don't see any better way to speed it up using normal count on such a big SQL table
Plan A: INDEX(x,y,id)
and toss the ORDER BY
.
Plan B: You want only the last 24 hours worth, but where is the timestamp? Maybe that is x
? then INDEX(timestamp, y)
and toss the ORDER BY
.
Plan C: Build and maintain a "Summary table": http://mysql.rjweb.org/doc.php/summarytables
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.