简体   繁体   中英

Which is more efficient - Multiple rows or multiple columns?

Each user will use this data at least 15 times when they are logged in. So READ is more important.

So i have two approaches, i know this is a rookie question but I am just confused between the options:

Approach 1 Have multiple rows with less columns,

id                      data              user
1                       task1              1 
2                       task2              1 
3                       task3              1
4                       task1              7

And Approach 2 Have multiple columns with single row

id   task1       task2         task3        user
1    True        True          True           1
2    True        False         False          7

Please suggest which is a best approach, everything is heavily based on READ only. So i will literally fetching all this to calculate some permission and action. So these will be used on some major routes which users often visit.

I think you're doing some premature optimization here.

It's very rare that a database slows down because of small quick queries like this. What gets you is usually the big search query when it misbehaves or if the indices aren't optimal for the job.

As everyone said, approach 2 is terrible because you need to add columns every time you want to add a new task. That's a typical red flag for a bad design. In addition, if you want to search these columns, you'll also need to add indices on them.

Approach 1 is the usual way, and it works well. The typical problem with this one is when you want to search based on attributes, because you have to join once per attribute, which doesn't optimize well.

In this case however, since you say this will be read at login, I guess this is about storing user rights or tasks associated with users. Perhaps you will select this data and cache it in the session so it only needs to be fetched once at login. So in this case, you should worry more about the queries that occur on every page, rather than the query that only occurs at login.

Anyway. Approach 1 has one gotcha: if the data isn't clustered, and the lines for one user sit in different pages in the table file on your disk, then it will need one IO per line. That's not really a problem with SSDs, but well.

Fortunately, postgres supports two ways of avoiding that: cluster, and index-only scans.

CLUSTER just orders the table on disk in the order of the index you specify. Since you need an index on (user,task) anyway to quickly find if a user has a task, you can cluster on that index, and all the lines for a user will be in the same place on disk, so only one IO will be needed to fetch them. However CLUSTER locks the table, so it's best to use it during scheduled maintenance. If you table has only a few million rows, and if you set maintenance_work_mem high enough, it will only take a couple seconds.

The other way is index-only scans. If you have an index on (user,task) and you run SELECT user,task WHERE user=... then postgres will use an index-only scan, and in the index data is ordered by (user,task) which means it will do one IO to get the page with the first row, and then the next rows for that user will be stored just afterward in index order, on the same page, so they're already loaded and very fast to access.

Notes:

Since you have no other columns, I'll assume (user,task) is unique, because it makes no sense to have duplicates in this case. So that can be your primary key, and you can drop the id and associated index. You don't have to use a sequence on every table if the data gives you a nice natural primary key.

"task" would usually be a foreign key to another table.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2025 STACKOOM.COM