I am re-writing the SQL which lets a user search for any other user on our site and also shows their roles.
An an example, roles can be "Writer", "Editor", "Publisher".
Each role links a User to a Publication.
Users can take multiple roles within multiple publications.
Example table setup:
"users" : user_id, firstname, lastname
"publications" : publication_id, name
"link_writers" : user_id, publication_id
"link_editors" : user_id, publication_id
Current psuedo SQL:
SELECT * FROM (
(SELECT user_id FROM users WHERE firstname LIKE '%Jenkz%')
UNION
(SELECT user_id FROM users WHERE lastname LIKE '%Jenkz%')
) AS dt
JOIN (ROLES STATEMENT) AS roles ON roles.user_id = dt.user_id
At the moment my roles statement is:
SELECT dt2.user_id, dt2.publication_id, dt.role FROM (
(SELECT 'writer' AS role, link_writers.user_id, link_writers.publication_id
FROM link_writers)
UNION
(SELECT 'editor' AS role, link_editors.user_id, link_editors.publication_id
FROM link_editors)
) AS dt2
The reason for wrapping the roles statement in UNION clauses is that some roles are more complex and require a table join to find the publication_id and user_id.
As an example "publishers" might be linked accross two tables
"link_publishers": user_id, publisher_group_id
"link_publisher_groups": publisher_group_id, publication_id
So in that instance, the query forming part of my UNION would be:
SELECT 'publisher' AS role, link_publishers.user_id, link_publisher_groups.publication_id
FROM link_publishers
JOIN link_publisher_groups ON lpg.group_id = lp.group_id
I'm pretty confident that my table setup is good (I was warned off the one-table-for-all system when researching the layout). My problem is that there are now 100,000 rows in the users table and upto 70,000 rows in each of the link tables.
Initial lookup in the users table is fast, but the joining really slows things down.
How can I only join on the relevant roles?
-------------------------- EDIT ----------------------------------
Explain above (open in a new window to see full resolution).
The bottom bit in red, is the "WHERE firstname LIKE '%Jenkz%'" the third row searches WHERE CONCAT(firstname, ' ', lastname) LIKE '%Jenkz%'. Hence the large row count, but I think this is unavoidable, unless there is a way to put an index accross concatenated fields?
The green bit at the top just shows the total rows scanned from the ROLES STATEMENT.
You can then see each individual UNION clause (#6 - #12) which all show a large number of rows. Some of the indexes are normal, some are unique.
It seems that MySQL isn't optimizing to use the dt.user_id as a comparison for the internal of the UNION statements. Is there any way to force this behaviour?
Please note that my real setup is not publications and writers but "webmasters", "players", "teams" etc.
My initial idea was to create a temporary table to hold (and index) the user_id that match the name, and use that to join against each link table. Unfortunately, in MySQL a temporary table can only be joined against ONCE in a query.
The nasty workaround is to create a permanent table, adding the connection_id to the primary key so separate sessions do not get confused.
create table tt ( connection_id int not null,
user_id int not null,
firstname varchar(10) not null,
lastname varchar(10) not null,
primary key( connection_id, user_id ) );
The following sequence would be repeated each time you need an answer:
delete from tt where connection_id = connection_id();
insert into tt
SELECT connection_id(), user_id, firstname, lastname FROM users
WHERE firstname LIKE '%Jenkz%'
UNION
SELECT connection_id(), user_id, firstname, lastname FROM users
WHERE lastname LIKE '%Jenkz%';
Next, your existing UNION is extended so that only the relevant user_id is pulled out:
SELECT 'writer' AS role, link_writers.user_id, link_writers.publication_id
FROM link_writers
JOIN tt ON tt.connection_id = connection_id() and tt.user_id = link_writers.user_id
UNION
SELECT 'editor' AS role, link_editors.user_id, link_editors.publication_id
FROM link_editors
JOIN tt ON tt.connection_id = connection_id() and tt.user_id = link_editors.user_id
UNION
SELECT 'publisher' AS role, link_publishers.user_id, link_publisher_groups.publication_id
FROM link_publishers
JOIN link_publisher_groups
ON link_publisher_groups.publisher_group_id = link_publishers.publisher_group_id
JOIN tt ON tt.connection_id = connection_id() and tt.user_id = link_publishers.user_id
Perhaps this will be an improvement because not every row of all the link tables is being pulled into the union.
The EXPLAIN is a bit peculiar in that only 4 bytes of the index on tt is being used - where I would have expected all 8 bytes. Perhaps this is because I have so little data in tt.
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: tt
type: ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: const
rows: 1
Extra: Using index
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: link_writers
type: ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: test.tt.user_id
rows: 1
Extra: Using index
*************************** 3. row ***************************
id: 2
select_type: UNION
table: tt
type: ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: const
rows: 1
Extra: Using index
*************************** 4. row ***************************
id: 2
select_type: UNION
table: link_editors
type: ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: test.tt.user_id
rows: 1
Extra: Using index
*************************** 5. row ***************************
id: 3
select_type: UNION
table: tt
type: ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: const
rows: 1
Extra: Using index
*************************** 6. row ***************************
id: 3
select_type: UNION
table: link_publishers
type: ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: test.tt.user_id
rows: 1
Extra: Using index
*************************** 7. row ***************************
id: 3
select_type: UNION
table: link_publisher_groups
type: ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: test.link_publishers.publisher_group_id
rows: 2
Extra: Using index
*************************** 8. row ***************************
id: NULL
select_type: UNION RESULT
table: <union1,2,3>
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: NULL
Extra:
8 rows in set (0.00 sec)
Having checked out OMG Ponies' answer for SO - Use Of Correlated Subquery , I came up with this:
SELECT * FROM (
(SELECT user_id FROM users WHERE firstname LIKE '%Jenkz%')
UNION
(SELECT user_id FROM users WHERE lastname LIKE '%Jenkz%')
) AS dt
JOIN ( SELECT 'writer' AS role, link_writers.user_id, link_writers.publication_id
FROM link_writers
UNION
SELECT 'editor' AS role, link_editors.user_id, link_editors.publication_id
FROM link_editors
UNION
SELECT 'publisher' AS role, lp.user_id, lpg.publication_id
FROM link_publishers lp
JOIN link_publisher_groups lpg ON lpg.publisher_group_id = lp.publisher_group_id
) roles on roles.user_id = dt.user_id
The explain looks reasonable on my tiny data set. What does it look like on the real thing ?
Yet another approach is to slightly denormalise your design to better support your query.
To do this, create a new table "role":
create table role (
user_id int not null,
role enum ('writer', 'editor', 'publisher' ) not null,
primary key (user_id, role )
);
This would be updated whenever a new row is added to one of your link tables that includes a user_id:
insert ignore into role values( $user_id, $role );
After a while, it is likely that the role entry will already exist, hence the "ignore" modifier.
The table could be boot-strapped from your existing tables:
insert ignore into role select distinct user_id, 'writer' from link_writers;
insert ignore into role select distinct user_id, 'editor' from link_editors;
insert ignore into role select distinct user_id, 'publisher' from link_publishers;
Your search query then becomes a set of simple JOINS which MySQL should have no problem optimising:
SELECT
r.user_id,
r.role,
case r.role
when 'writer' then w.publication_id
when 'editor' then e.publication_id
when 'publisher' then pg.publication_id
end as publication_id
FROM (
(SELECT user_id FROM users WHERE firstname LIKE '%Jenkz%')
UNION
(SELECT user_id FROM users WHERE lastname LIKE '%Jenkz%')
) AS dt
JOIN role r on r.user_id = dt.user_id
LEFT JOIN link_writers w on r.user_id = w.user_id and r.role = 'writer'
LEFT JOIN link_editors e on r.user_id = e.user_id and r.role = 'editor'
LEFT JOIN link_publishers p on r.user_id = p.user_id and r.role = 'publisher'
LEFT JOIN link_publisher_groups pg on p.publisher_group_id = pg.publisher_group_id;
This will give a very "wide" answer.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.