简体   繁体   中英

Performance question sql

I'm making a forum.

And I'm wondering if i should store the number of replies in the topic table or count the posts of the topic?
How much slower will it be if i use sql and count them? Lets say i have a billion posts. Will it be much slower? Im not planning on being that big but what if? How much slower would i be compared to stroing the num in topics?

Thanks

It will be slower as your db grows in size. If you are planning on having a large post table, store the value in the topic table

I just ran some tests on a MySQL 4.0 box we have using a table with over 1 million records.

SELECT COUNT(*) FROM MyTable; ~1 million took 22ms

SELECT COUNT(*) FROM MyTable WHERE Role=1; ~800,000 took 3.2s

SELECT COUNT(*) FROM MyTable WHERE Role=2; ~20 took 12ms

The Role column in this case was indexed and this was connecting to the MySQL remotely.

I think your posts table will have to get very large for the query times to really become an issue. I also think it is a pre-optimization to put the cache of the count in your topics table. Build it without it for now and if it becomes a problem its a pretty easy update to change it.

  1. Do not store the value in a table.
  2. Cache the value in the application for some time so the count(*) query wont be executed too often.
  3. Choose cache time depending on the server load: higher for very busy and zero for couple of users.

The count(*) in SqlServer is pretty fast (assuming you have index on the field you are counting on). So you just need to reduce number of hits under the heavy load.
If you will store the value in a table you will have a lot of hassle maintaining it.

This is going to affect scaling and is an issue of normalization. Hardcore normalization nerds will tell you that you shouldn't keep the number of posts on the topic because it causes redundant data. But you need to keep in mind that if you don't store it there you need to do an extra query on every load to fetch the number. The alternative is to do an extra query on every update/insert instead, which will almost always occur much less often than select's. As you scale a site to support a lot of traffic it becomes almost inevitable that you have to eventually start to de-normalize some of your data, especially in cases like this.

Redundant data isn't inherently bad. Poorly managed redundancy is. As long as you have the proper checks in place to prevent the data from getting out of sync then the potential benefit of storing the number of posts on the thread is worth the extra bit of code IMO.

I think a lot of this will depend on how rapidly you're pushing data in. If you store the value in a topic table, then you may find that you're needing to increment (or decrement if you delete records) very frequently too.

Indexes (indices?) may be a nicer option, as you can store a tiny subset of the data, and be able to access richer information. Consider the fact that it can be quite quick to count how many Farleys there are in the phone-book, because I can go straight there and easily count them.

So, as is often the case, the answer is probably 'It depends'.

I like storing counts in the table rather than counting them every time. It's such an easy operation and you never have to think about the expense of showing it when you're retrieving it. With a forum you're going to be displaying it more often than you're going to be changing it anyway so it makes sense to make that as cheap as possible. It might be a bit premature but it might save you some headaches later.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM