简体   繁体   中英

Sum one table and update other with result, or just do `sum` on select?

Might be a dummy question, but I'm creating a system in which users are not allowed to exceed X mounts of kb in pictures they can upload.

When uploading a picture I update the images table with the size of the image in KB and other info.

Now, should I also keep track of the total size of each users images on the users table? Or should I just do a select sum(size) from images where user = xxx every time I want to check the limit? Which might with every new upload?

What would be the best approach from a relational point of view?

You can use either method.

However, because you have a business rule related to the sum of the sizes, I might suggest that you use triggers to maintain the sum at the user level. Although this has some additional overhead for insert s/ update s/ delete s, it has much less overhead when returning information about a user.

This has a few other advantages as well:

  • You can impose business rules on the sizes. For instance, you can round the sizes up to the nearest 1k and then sum them. You wouldn't want such business logic spread through multiple queries.
  • You can implement a check constraint directly in the users table (well, you can do this in the most recent versions of MySQL).
  • You can index the total image size, so you easily see who is closest to their limit.

Storing the SUM in the users table is one type of denormalization.

This can be worthwhile if you need to query the sum frequently, and it's too slow to do the aggregate query every time you need it.

But you accept the risk that the stored sum in the users table will become out of sync with the real SUM(size) of the associated images.

You wouldn't think this would be difficult, but in practice, there are lots of edge case where the stored sum fails to be updated. You will end up periodically running the aggregate query in the background, to overwrite the stored sum, just in case it has gotten out of sync.

Denormalization is more work for you as a coder, because you have to write code to correct for anomalies like that. Be conservative about how many cases of denormalization you create, because each one obligates you to do more work.

But if it's very important that your query for the sum return the result faster than is possible by running the aggregate query, then that's what you have to do.

In my experience, all optimizations come with a price.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM