简体   繁体   中英

Speed up MySQL inner join with LIKE clause

I have the following 2 tables, api_analytics_data, and telecordia.

CREATE TABLE `api_analytics_data` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `upload_file_id` bigint(20) NOT NULL,
  `partNumber` varchar(100) DEFAULT NULL,
  `clei` varchar(45) DEFAULT NULL,
  `description` varchar(150) DEFAULT NULL,
  `processed` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`id`),
  KEY `idx_aad_clei` (`clei`),
  KEY `idx_aad_pn` (`partNumber`),
  KEY `id_aad_processed` (`processed`),
  KEY `idx_combo1` (`partNumber`,`clei`,`upload_file_id`)
) ENGINE=InnoDB CHARSET=latin1;

CREATE TABLE `telecordia` (
  `tid` int(11) NOT NULL AUTO_INCREMENT,
  `ProdID` varchar(50) DEFAULT NULL,
  `Mfg` varchar(20) DEFAULT NULL,
  `Pn` varchar(50) DEFAULT NULL,
  `Clei` varchar(50) DEFAULT NULL,
  `Series` varchar(50) DEFAULT NULL,
  `Dsc` varchar(50) DEFAULT NULL,
  `Eci` varchar(50) DEFAULT NULL,
  `AddDate` date DEFAULT NULL,
  `ChangeDate` date DEFAULT NULL,
  `Cost` float DEFAULT NULL,
  PRIMARY KEY (`tid`),
  KEY `telecordia.ProdID` (`ProdID`) USING BTREE,
  KEY `telecordia.clei` (`Clei`),
  KEY `telecordia.pn` (`Pn`),
  KEY `telcordia.eci` (`Eci`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Users upload data via a web interface using Excel/CSV files into api_analytics_data. The data contains EITHER the partNumbers or CLEIs. I then update the api_analytics_data table by joining the telecordia table. The telecordia table is the master list of partNumber and Cleis.

So if a user uploads a file of CLEIs, the update/join I use is:

update api_analytics_data aad
  inner join telecordia t on aad.clei = t.Clei
  set aad.partNumber = t.Pn
  where aad.partNumber is null
  and aad.upload_file_id = 5;

It works quickly, but not very thoroughly. The problem I have is that the CLEI uploaded may only be a substring of the CLEI in the telecordia table.

For example, the uploaded CLEI may be " 5SC1DX0 ". In the telcordia table, the correct matching row is:

tid:        184324    
ProdID:     472467  
Mfg:        PLSE 
Pn:         AUA58-2-REV-E            
Clei:       5SC1DX04AA        
Series:     null
Dsc:        DL SGL-PTY POTS CU RT                
Eci:        205756    
AddDate:    1994-03-18      
ChangeDate: 1998-04-13     
Cost:       null

So obviously my update doesn't work in this case, even though 5SC1DX0 and 5SC1DX04AA are the same part.

What I need is a wildcard search. However, when I try this, it is crazy slow. With about 4500 rows uploaded into the api_analytics_data table, it runs for about 10 minutes, and then loses the connection with the server.

update api_analytics_data aad
  inner join telecordia t on aad.clei like concat(t.Clei,'%')
  set aad.partNumber = t.Pn
  where aad.partNumber is null 
  and aad.upload_file_id = 5;

Is there a way to optimize this so that it runs quickly?

The correct answer is "no". The better course of action is to create a new column in telecordia with the correct Clei value in it, one that can be used for joining the tables. In the most recent versions of MySQL, this can even be a computed column and be indexed.

That said, you might be able to do something if the matching portion is always the same length. If so, try this:

update api_analytics_data aad inner join
       telecordia t
       on t.Clei = left(aad.clei, 7)
  set aad.partNumber = t.Pn
  where aad.partNumber is null and aad.upload_file_id = 5;

For this query, you want an index on api_analytics_data(upload_fiel_id, partNumber, clei) and telecordia(clei, pn) .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM