Is indexing link tables smart?

Question

So let's say for example I have 2 tables: Users > Items

Users can have favorite Items, and a Item can have multiple users that see it as a favorite, so I'll be using a linking table.

Now my linking table would contain something like:

id (int 11 AI)
user_id (int 11)
item_id (int 11)

Now would it be necessary / usefull to put a index on user_id and item_id since this table will contain a lot of records over time.

I'm not a 100% sure when to use indexes. My idea of when to use them(Might be completely incorrect though) is when you have big database and need to search/filter on a column then you index it. If this is incorrect I'm sorry, it's just what I've always been told.

Answer 1

In short, yes.

Imagine how well joins would work if, each time you needed to match a primary key value to a foreign key in another table, the DBMS had to search the entire table for the matching keys.

Answer 2

Basically, yes, that's how it goes.

In this case, I'd say that an index on the user_id column would be useful, because you will display to the user a list of their favorites, right?

An index on the item_id might be less useful, because I doubt you're going to display a list of users that have favorited a specific item. Although you might care about the count ("100 users like this item"), so you might add that index after all. Or you might de-normalize and keep the count in the items table. That would give a better performance, although you'll need to write extra code to maintain that number.

Last but not least - in a link table, you can do away with the id column. Just add the primary key index on both columns ( user_id and item_id in that order). This will make sure that you cannot enter duplicate rows, and since user_id is the first column in the index, you'll be able to use it in search queries. No need anymore to add a separate index on just the user_id column.

However this also depends on the code you're using. If you're using some kind of framework (ORM?) that REQUIRES an id column for every table, then this trick is useless.

As requested by the author, here's a quick intro on what indexes are .

Suppose you have a DB table which is just a bunch of rows in no particular order. Let's say we have a table people with the columns name , surname , age .

Now, when you want to find the age for John Smith you probably make a query like this:

select age from people where name='John' and surname='Smith'

When you do this, the DB engine can do only one thing - it has to go through ALL the rows and look for the ones that match. If there's 100,000 rows, it will be slow.

Now there's a faster way of doing this. Think about a phonebook (the classical paper edition). On it's thousand yellow pages there are phone numbers for hundreds of people. Yet you can find the number you seek very quickly even if you're a human being. That's because the numbers are sorted alphabetically by name and surname. You open a random page and you can immediately see whether the number you're looking for is before or after the page you opened. Repeat a couple of times and you've found it.

This kind of searching is called a "binary search". Your DB engine could do this too, if the records were sorted by name and surname. So this is what a Primary Key is - it tells the DB to store the records not in some random order, but sorted by some columns. When a new record comes, it can quickly find its rightful place and push it in there, thus keeping the table forever sorted.

There are a few things to note here already.

First, you can make it sort by one or more columns, but, just like in a phonebook, the order is important. If you sort by name first and then by surname , then that's the order the records will be in. So you'll be able to quickly find all the records where name='John' or name='John' and surname='Smith' , but it won't help you at all if you need to find just surname='Smith' . Just like in a phonebook.

Second, pushing a record somewhere in the middle is also somewhat slow. Not criminally so, but still. Appending a record at the end is faster. Therefore people tend to use auto_increment columns for their Primary Keys, because then every new row will be placed at the end.

Third, in most DBs Primary Key is not only also used to search quickly, but also uniquely identify the row. Which means that the DB will not be happy if there are two rows that have equal values for the Primary Key columns. In that case, it cannot determine which has to go first, and which last, and it's also not unique. Another reason to use auto_increment . Note that if the PK index has multiple columns in it, then their combination must be unique - every column individually may be non-unique. In our case that means that there can be many Johns and many Smiths, but only one John Smith.

But we still have a problem. What if we want to quickly find rows both by just the name , and just the surname ? A PK index can only do one of those things, not both at the same time.

This is where other non-PK indexes come in play. You can add as many of those as you want to the table. In our case, we could create another index to hold just the surname column.

When we do so, the DB creates another hidden table (OK, not true, but you can think of it this way) which is a copy of the original table, but only with the surname column and a special link back to the rows in the original table. This hidden index table is sorted by the surname column. So when you now need to find a row by specifying just the surname , the DB engine can look it up in the hidden index table, and then follow the links back to the original rows and get the data from them. Much faster.

These non-PK indexes also typically come in a few flavors. There's the standard "index" which places no restrictions at all - you can have duplicate values in the columns, nulls, etc. There's a "unique" index, which enforces that all the values in the index need to be unique; and then there are sometimes speciality indexes like FullText, Spatial, etc. Indexes also tend to have some technical options, but you'll have to read the documentation of your DB for those.

One last important thing to note is - indexes make it fast to find things in a table, but they come at a cost. Modifications to the table (insert, update, delete) become slower, because the indexes need to be updated as well. Keep that in mind and only add them where necessary.

Except for Primary Keys. ALWAYS add Primary Keys. That's an order! :)

Is indexing link tables smart?

Question

2 answers

solution1
1 2015-01-18 19:49:40

solution2
1 ACCPTED 2015-01-18 19:54:11

Is indexing link tables smart?

Question

2 answers

solution1 1 2015-01-18 19:49:40

solution2 1 ACCPTED 2015-01-18 19:54:11

solution1
1 2015-01-18 19:49:40

solution2
1 ACCPTED 2015-01-18 19:54:11