简体   繁体   English

简单的选择查询在C#应用程序的MySQL数据库的非常大的表中花费更多时间

[英]Simple select query takes more time in very large table in MySQL database in C# application

I am using a MySQL database in my ASP.NET with C# web application. 我在带有C#Web应用程序的ASP.NET中使用MySQL数据库。 The MySQL Server version is 5.7 and there is 8 GB RAM in the PC. MySQL服务器版本为5.7,PC中有8 GB RAM。 When I am executing the select query in MySQL database table, it takes more time in execution; 当我在MySQL数据库表中执行选择查询时,它需要花费更多的时间来执行。 a simple select query takes around 42 seconds . 一个简单的选择查询大约需要42秒 Across 1 crorerecord (10 million records) in the table. 跨表的1千万记录(1000万记录)。 I have also done indexing for the table. 我还为该表建立了索引。 How can I fix this? 我怎样才能解决这个问题?

The following is my table structure. 以下是我的表结构。

CREATE TABLE `smstable_read` (
    `MessageID` int(11) NOT NULL AUTO_INCREMENT,
    `ApplicationID` int(11) DEFAULT NULL,
    `Api_userid` int(11) DEFAULT NULL,
    `ReturnMessageID` varchar(255) DEFAULT NULL,
    `Sequence_Id` int(11) DEFAULT NULL,
    `messagetext` longtext,
    `adtextid` int(11) DEFAULT NULL,
    `mobileno` varchar(255) DEFAULT NULL,
    `deliverystatus` int(11) DEFAULT NULL,
    `SMSlength` int(11) DEFAULT NULL,
    `DOC` varchar(255) DEFAULT NULL,
    `DOM` varchar(255) DEFAULT NULL,
    `BatchID` int(11) DEFAULT NULL,
    `StudentID` int(11) DEFAULT NULL,
    `SMSSentTime` varchar(255) DEFAULT NULL,
    `SMSDeliveredTime` varchar(255) DEFAULT NULL,
    `SMSDeliveredTimeTicks` decimal(28,0) DEFAULT '0',
    `SMSSentTimeTicks` decimal(28,0) DEFAULT '0',
    `Sent_SMS_Day` int(11) DEFAULT NULL,
    `Sent_SMS_Month` int(11) DEFAULT NULL,
    `Sent_SMS_Year` int(11) DEFAULT NULL,
    `smssent` int(11) DEFAULT '1',
    `Batch_Name` varchar(255) DEFAULT NULL,
    `User_ID` varchar(255) DEFAULT NULL,
    `Year_ID` int(11) DEFAULT NULL,
    `Date_Time` varchar(255) DEFAULT NULL,
    `IsGroup` double DEFAULT NULL,
    `Date_Time_Ticks` decimal(28,0) DEFAULT NULL,
    `IsNotificationSent` int(11) DEFAULT NULL,
    `Module_Id` double DEFAULT NULL,
    `Doc_Batch` decimal(28,0) DEFAULT NULL,
    `SMS_Category_ID` int(11) DEFAULT NULL,
    `SID` int(11) DEFAULT NULL,
    PRIMARY KEY (`MessageID`),
    KEY `index2` (`ReturnMessageID`),
    KEY `index3` (`mobileno`),
    KEY `BatchID` (`BatchID`),
    KEY `smssent` (`smssent`),
    KEY `deliverystatus` (`deliverystatus`),
    KEY `day` (`Sent_SMS_Day`),
    KEY `month` (`Sent_SMS_Month`),
    KEY `year` (`Sent_SMS_Year`),
    KEY `index4` (`ApplicationID`,`SMSSentTimeTicks`),
    KEY `smslength` (`SMSlength`),
    KEY `studid` (`StudentID`),
    KEY `batchid_studid` (`BatchID`,`StudentID`),
    KEY `User_ID` (`User_ID`),
    KEY `Year_Id` (`Year_ID`),
    KEY `IsNotificationSent` (`IsNotificationSent`),
    KEY `isgroup` (`IsGroup`),
    KEY `SID` (`SID`),
    KEY `SMS_Category_ID` (`SMS_Category_ID`),
    KEY `SMSSentTimeTicks` (`SMSSentTimeTicks`)
) ENGINE=MyISAM AUTO_INCREMENT=16513292 DEFAULT CHARSET=utf8;

The following is my select query: 以下是我的选择查询:

SELECT messagetext, SMSSentTime, StudentID, batchid,
User_ID,MessageID,Sent_SMS_Day, Sent_SMS_Month,
Sent_SMS_Year,Module_Id,Year_ID,Doc_Batch
FROM smstable_read
WHERE StudentID=977 AND SID = 8582 AND MessageID>16013282

You need to learn about compound indexes and covering indexes. 您需要了解复合索引和覆盖索引。 Read about those things. 了解这些东西。

Your query is slow because it's doing a half-scan of the table. 您的查询很慢,因为它正在对表进行半扫描。 It uses the primary key to find the first row with a qualifying MessageID , then looks at every row of the table to find matching rows. 它使用主键查找具有合格MessageID的第一行,然后查看表的每一行以查找匹配的行。

Your filter criteria are StudentID = constant , SID = constant AND MessageID > constant . 您的过滤条件是StudentID = constantSID = constantMessageID > constant That means you need those three columns, in that order, in an index. 这意味着您需要在索引中按顺序排列这三列。 The first two filter criteria will random-access your index to the correct place. 前两个过滤条件将随机访问索引到正确的位置。 The third criterion will scan the index starting right after the constant value in your query. 第三个条件将在查询中的常量值之后立即开始扫描索引。 It's called an Index Range Scan operation, and it's quite efficient. 这称为索引范围扫描操作,并且非常有效。

ALTER TABLE smstable_read
  ADD INDEX StudentSidMessage (StudentId, SID, MessageId);

This compound index should make your query efficient. 该复合索引应使您的查询高效。 Notice that in MyISAM, the primary key column of a table should appear in compound indexes. 请注意,在MyISAM中,表的主键列应出现在复合索引中。 That's cool in this case because it's also part of your query criteria. 在这种情况下,这很酷,因为它也是查询条件的一部分。

If this query is used very frequently, you could make a covering index: you could add the other columns of the query (the ones mentioned in your SELECT clause) to the index. 如果此查询的使用频率很高,则可以创建覆盖索引:您可以将查询的其他列(在SELECT子句中提到的列)添加到索引中。

But, unfortunately you have defined your messageText column with a longtext data type. 但是,不幸的是,您已经使用长longtext数据类型定义了messageText列。 That allows for each message to contain up to four gigabytes. 这样一来,每封邮件最多可包含4 GB。 (Why? Is this really SMS data? There's a limit of 160 bytes per message in SMS. Four gigabytes >> 160 bytes.) (为什么?这真的是SMS数据吗?SMS中的每条消息限制为160个字节。四个千兆字节>> 160个字节。)

Now the point of a covering index is to allow the query to be satisfied entirely from the index, without referring back to the table. 现在,覆盖索引的重点是允许完全从索引中满足查询,而无需返回表。 But when you include a longtext or any other LOB column in an index, it only contains a subset of the data. 但是,当您在索引中包含长longtext或任何其他LOB列时,它仅包含数据的子集。 So the point of the covering index is lost. 因此覆盖指数的点丢失了。

If I were you I would change my table so messageText was a VARCHAR(255) data type, and then create this covering index: 如果您是我,我将更改表,使messageTextVARCHAR(255)数据类型,然后创建此覆盖索引:

ALTER TABLE smstable_read
  ADD INDEX StudentSidMessage (StudentId, SID, MessageId,
            SMSSentTime, batchid,
            User_ID, Sent_SMS_Day, Sent_SMS_Month,
            Sent_SMS_Year,Module_Id,Year_ID,Doc_Batch,
            messageText);

(Notice that you should put variable-length items last in the index if you can.) (请注意,如果可以,应将变长项目放在索引的最后。)

If you can't change your application to handle VARCHAR(255) then go with the first index I mentioned. 如果您不能将应用程序更改为处理VARCHAR(255)请使用我提到的第一个索引。

Pro tip: putting lots of single-column indexes on MySQL tables rarely helps SELECT performance and always harms INSERT and UPDATE performance. 专家提示: 在MySQL表上放置许多单列索引很少会提高SELECT性能,并且始终会损害INSERT和UPDATE性能。 You need an index on your primary key, and you need indexes to support the queries you run. 您需要在主键上建立索引,并且还需要索引来支持您运行的查询。 Extra indexes are harmful. 多余的索引是有害的。

It looks like your database is not properly indexed and even not properly normalized. 看来您的数据库没有正确索引,甚至没有正确规范化。 Normalizing your database will go a long way to speed up all your queries. 规范化数据库将大大加快所有查询的速度。 Particularly in view of the fact that mysql used only one index per table in a query. 特别是考虑到MySQL在查询中每个表仅使用一个索引这一事实。 Even though you have lot's of indexes, they cannot be used. 即使您有很多索引,也无法使用它们。

Your current query filters on StudentID , SID , and MessageID . 您当前的查询会根据StudentIDSIDMessageID过滤。 The last is an inequality comparision so an index will not be very effective with that but the other two columns are equality comparisons. 最后一个是不平等比较,因此索引将不会非常有效,但是其他两列是相等比较。 I suggest an index like this: 我建议这样的索引:

KEY `studid` (`StudentID`,`SID`)

Follow that up by dropping your existing index on SID . 接下来,删除SID上的现有索引。 If you find that you don't want to drop it because it's used in another query, further evidence that your table is in desperate need of normalization. 如果发现您不想因为在另一个查询中使用它而将其删除,请进一步证明您的表迫切需要规范化。

Too many indexes slow down inserts and adds a little overhead to each SELECT because the query planner needs more effort to figure out which index to use. 太多的索引会减慢插入速度,并给每个SELECT添加一点点开销,因为查询计划者需要更多的精力来确定要使用哪个索引。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM