I need to find the rows from table t1 that have a unique (TRAN_ID,CMTE_ID) pair, where TRAN_ID and CMTE_ID are two of the columns. Then I'd like to insert these rows into the table uniques
.
The problem is that the table uniques
seems to end up containing duplicate pairs.
Note: table t1 was created using the InnoDb engine and then updated to use the MyISAM engine in order to speed up group by and join operations. t1 has 130 million rows.
Here's my create query:
DROP TABLE IF EXISTS uniques;
CREATE TABLE `uniques` (
`CMTE_ID` varchar(9) DEFAULT '',
`TRAN_ID` varchar(32) DEFAULT '',
KEY `TRAN_INDEX` (`TRAN_ID`,`CMTE_ID`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
Then I run the query and insert into uniques
:
LOCK TABLES uniques write, t1 write;
INSERT INTO uniques
SELECT TRAN_ID,CMTE_ID
FROM t1
GROUP BY TRAN_ID,CMTE_ID
HAVING count(*) = 1;
UNLOCK TABLES;
At this point, I expect uniques
to be populated with rows with unique (TRAN_ID,CMTE_ID) pairs. However, when I run
SELECT * FROM uniques
GROUP BY TRAN_ID,CMTE_ID
having count(*) > 1;
I still get a long list of rows. What's going on?
You might want to put a unique
contraint on the pair to prevent uniques.
First guesses are operator error or the table already had data. Discounting those, there is another possibility. The types of the fields are:
`CMTE_ID` varchar(9) DEFAULT '',
`TRAN_ID` varchar(32) DEFAULT '',
Perhaps these are not big enough, so the data is actually being truncated when loaded into the table. This is just an idea. Your process seems sound.
EDIT:
Actually, I think the last is what is happening. Your insert
query is equivalent to:
INSERT INTO uniques(CMTE_ID, TRAN_ID)
SELECT TRAN_ID,CMTE_ID
FROM t1
GROUP BY TRAN_ID,CMTE_ID
HAVING count(*) = 1;
Note that the column orders are different, so TRAN_ID
is being loaded into CMTE_ID
and vice versa. Because the types are different, the CMTE_ID
is probably being truncated.
This is a good lesson in why you should always include column lists in insert
statements.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.