繁体   English   中英

MySQL从select distinct + union中获取重复项

[英]MySQL getting duplicates from select distinct + union

当我在MySQL中运行以下查询时,我得到了很多重复。 我知道我已经非常清楚,我只需要不同的记录,所以我无法理解为什么它会为我加倍。 当我包含最后一个联合( importorders表)时,似乎所有重复都会出现,因为大多数客户在客户和订单中具有相同的地址。 任何人都可以帮助我理解为什么会这样吗?

SELECT DISTINCT PostalCode, City, Region, Country
FROM 
(select distinct postalcode, city, region, country
from importemployees
UNION
select distinct postalcode, city, region, country
from importcustomers
UNION
select distinct postalcode, city, region, country
from importproducts
UNION
select distinct shippostalcode as postalcode, shipcity as city, shipregion as region, shipcountry as country
from importorders) T

查询和结果

如你看到的。 有些行是重复的。

如果我首先使用INSERT IGNORE插入importcustomers ,然后使用importorders ,那么它会设法将记录标识为重​​复。 为什么选择查询不起作用?

非常奇怪的问题。 当我放弃'国家'时似乎解决了这个问题。

SELECT DISTINCT PostalCode, City, Region

共128次,查询耗时0.0066秒

SELECT DISTINCT PostalCode, City, Region, Country

209总计,查询耗时0.0002秒

此外,该行为似乎只影响ImportCustomersImportOrders

SELECT postalcode, city, region, country
FROM 
    (SELECT postalcode, city, region, country FROM importcustomers
    UNION
    SELECT shippostalcode, shipcity, shipregion, shipcountry FROM importorders) t

总计172,查询花了0.0053秒

SELECT postalcode
FROM 
    (SELECT postalcode FROM importcustomers
    UNION
    SELECT shippostalcode FROM importorders) t

共计91次,查询耗时0.0050秒

然后我把它缩小到了importcusotmersimportorderscountry专栏

SELECT TRIM(country) AS country FROM importcustomers
UNION
SELECT TRIM(shipcountry) AS country FROM importorders
Argentina
Argentina
Austria
Austria
Belgium
Belgium
...

当我将列转换为BINARY时发生了一些有趣的事情

SELECT BINARY country AS country FROM importcustomers
UNION
SELECT BINARY shipcountry AS country FROM importorders
Argentina
417267656e74696e610d
Austria
417573747269610d
Belgium
42656c6769756d0d
...

ImportOrders导致重复。

 SELECT BINARY shipcountry AS country FROM importorders
4765726d616e790d
5553410d
5553410d
4765726d616e790d
...

查看您提供的转储,在国家/地区的末尾附加了一个额外的\\r (在值中用0d表示)。

--
-- Dumping data for table `importorders`
--
INSERT INTO `importorders` VALUES 
...'Germany\r'),
...'USA\r'),
...'USA\r'),
...'Germany\r'),
...'Mexico\r'),

importcustomerscountry看起来很好:

--
-- Dumping data for table `importcustomers`
--
INSERT INTO `importcustomers` VALUES 
...'Germany', ... ,
...'Mexico', ... ,
...'Mexico', ... ,
...'UK', ... ,
...'Sweden', ... ,

您可以通过运行此查询来删除这些\\r的(回车):

UPDATE importorders SET ShipCountry = REPLACE(ShipCountry, '\r', '')

如果运行原始查询,则可以获得所需的结果集。 仅供参考,如果您使用的是UNION ,则不需要DISTINCT

SELECT PostalCode, City, Region, Country
FROM 
    (SELECT postalcode, city, region, country FROM importemployees
    UNION
    SELECT postalcode, city, region, country FROM importcustomers
    UNION
    SELECT postalcode, city, region, country FROM importproducts
    UNION
    SELECT shippostalcode as postalcode, shipcity as city, 
        shipregion as region, shipcountry as country FROM importorders) T

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM