简体   繁体   English

MySQL 在大型数据库上使用子查询和联接进行查询优化

[英]MySQL Query Optimization with subqueries and JOINs on large database

I am new here and have already searched the whole internet for a solution.我是新来的,已经在整个互联网上搜索了解决方案。 If I missed something, please let me know!如果我错过了什么,请告诉我!

I use a MySQL MariaDB with several million entries.我使用具有数百万个条目的 MySQL MariaDB。 My query results in timeouts.我的查询导致超时。 Even increasing the timeout does not help.即使增加超时也无济于事。 The goal of the query is to retrieve the relevant entries per year.查询的目标是每年检索相关条目。 My query must join a total of 5 tables.我的查询必须加入总共 5 个表。 (Unfortunately the structure of the DB is predefined, there is little I can change). (不幸的是,数据库的结构是预定义的,我几乎无法更改)。

Below you find the affected query.您可以在下面找到受影响的查询。 The date is stored as varchar , that's why I use the LIKE % operator.日期存储为 varchar ,这就是我使用LIKE %运算符的原因。 Unfortunately I don't have the permissions to change it.不幸的是,我没有更改它的权限。

    SELECT
    spCit.spPN,
    spCit.ipc1,
    spCit.ipc2,
    tbl_patinfo.pn
FROM
    tbl_patinfo
INNER JOIN(
    SELECT
        spIPC.spPN,
        spIPC.ipc1,
        spIPC.ipc2,
        tbl_patcit.pc_pn AS spDocNr
    FROM
        tbl_patcit
    INNER JOIN(
        SELECT DISTINCT
            tbl_ipc.pn AS spPN,
            tbl_ipc.value AS ipc1,
            tbl_ipc.main_cl AS ipc2
        FROM
            tbl_ipc
        RIGHT JOIN(
            SELECT
                tbl_sp.sp_CompanyAlias,
                OrgName,
                infoPN
            FROM
                tbl_sp
            INNER JOIN(
                SELECT DISTINCT
                    tbl_patinfo.pn AS infoPN,
                    tbl_adr.orgname AS OrgName
                FROM
                    tbl_patinfo
                LEFT JOIN tbl_adr ON tbl_patinfo.pn = tbl_adr.pn
                WHERE
                    tbl_patinfo.pub LIKE "%2004"
            ) AS PatPerYear
        ON
            tbl_sp.sp_CompanyAlias = PatPerYear.Orgname
        ) AS spPatents
    ON
        tbl_ipc.pn = spPatents.infoPN
    ) AS spIPC
ON
    tbl_patcit.pn = spIPC.spPN
) AS spCit
ON
    tbl_patinfo.docnr = spCit.spDocNr

Note: The whole query works if I don't make the last JOIN ( tbl_patinfo.docnr = spCit.spDocNr) .注意:如果我不进行最后一次 JOIN ( tbl_patinfo.docnr = spCit.spDocNr) ,则整个查询都有效。 So it is probably because of this step.所以很可能是因为这一步。

Explain Select of the query解释查询的Select

Thanks in advance for your help.在此先感谢您的帮助。 If I can provide any further information, please let me know.如果我可以提供任何进一步的信息,请告诉我。

---EDIT--- - -编辑 - -

So I managed to change the date type and now I am able to perform the query for a limit of 50 rows .所以我设法更改了日期类型,现在我能够执行50 行限制的查询。 But without limit I still get a timeout (over 7000s processing time).但是没有限制,我仍然会超时(超过 7000 秒的处理时间)。

I also reduced the VARCHAR(255) on the indexes where ever I could.我还尽可能减少了索引上的 VARCHAR(255)。 Not sure whether this has an impact, but this reduced the key_len (see latest explain select).不确定这是否有影响,但这减少了 key_len (请参阅最新的解释选择)。 Here is the new code:这是新代码:

 SELECT
        spCit.spPN,
        spCit.ipc1,
        spCit.ipc2,
        tbl_patinfo.pn
    FROM
        tbl_patinfo
    INNER JOIN(
        SELECT
            spIPC.spPN,
            spIPC.ipc1,
            spIPC.ipc2,
            tbl_patcit.pc_pn AS spDocNr
        FROM
            tbl_patcit
        INNER JOIN(
            SELECT DISTINCT
                tbl_ipc.pn AS spPN,
                tbl_ipc.value AS ipc1,
                tbl_ipc.main_cl AS ipc2
            FROM
                tbl_ipc
            RIGHT JOIN(
                SELECT
                    tbl_sp.sp_CompanyAlias,
                    OrgName,
                    infoPN
                FROM
                    tbl_sp
                INNER JOIN(
                    SELECT DISTINCT
                        tbl_patinfo.pn AS infoPN,
                        tbl_adr.orgname AS OrgName
                    FROM
                        tbl_patinfo
                    LEFT JOIN tbl_adr ON tbl_patinfo.pn = tbl_adr.pn
                    WHERE
                        YEAR(tbl_patinfo.pub) = 2004
                ) AS PatPerYear
            ON
                tbl_sp.sp_CompanyAlias = PatPerYear.Orgname
            ) AS spPatents
        ON
            tbl_ipc.pn = spPatents.infoPN
        ) AS spIPC
     ON
        tbl_patcit.pn = spIPC.spPN
    ) AS spCit
    ON
        tbl_patinfo.docnr = spCit.spDocNr

Here's the new EXPLAIN select这是新的解释 select

I'm very thankful for any idea!我非常感谢任何想法!

You are out of luck on that one.你在那个方面不走运。 A LIKE '%...' filter is unindexable. LIKE '%...' 过滤器是不可索引的。 There are things you could do about it, eg create a dynamic column, or a non-dynamic column maintained by a trigger, that contains the reverse of the tbl_patinfo.pub, and then instead of tbl_patinfo.pub LIKE '%2004' you could say tbl_patinfo.revpub LIKE '4002%'你可以做一些事情,例如创建一个动态列,或由触发器维护的非动态列,其中包含 tbl_patinfo.pub 的反向,然后你可以代替 tbl_patinfo.pub LIKE '%2004'说 tbl_patinfo.revpub LIKE '4002%'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM