为什么mysql选择count（distinct user_id）返回错误的数字？

Question

I have a big table in mysql.It has 13 million rows. 我在mysql中有一个大表。它有1300万行。

Mysql version is 5.7.10. Mysql版本是5.7.10。

Table structure as below: 表结构如下：

create table table_name (    
  user_id varchar(20) not null,    
  item_id varchar(20) not null 
);

1. The first sql is: 1.第一个sql是：

select count(distinct user_id) from table;

result:760,000 结果：760000

2. The second sql is: 2.第二个sql是：

select count(1) from (select user_id from table group by user_id) a;

result:120,000 结果12万

user_id is not null for each row. 每行的user_id不为null 。

And, the right number is 120,000.Why the first sql get the wrong number? 而且，正确的数字是120,000。为什么第一个sql得到错误的数字？

Then,I run the first sql in hive and spark-sql, the result is 120,000. 然后，我在hive和spark-sql中运行第一个sql，结果是120,000。

So, is this a mysql's bug or something can be setting to make things right? 那么，这是一个mysql的错误还是可以设置一些东西来做正确的事情？

Thank you! 谢谢！

Update :I try it on another PC, the result of first sql is 120,000.This time get the right number.Mysql version is 5.6.26. 更新：我在另一台PC上试用，第一个sql的结果是120,000。这次得到正确的号码.Mysql版本是5.6.26。 So, maybe it is a bug of 5.7.10. 所以，也许这是5.7.10的错误。

Answer 1

There are multiple known bugs in MySQL count distinct when a column is included in two-column unique key. 有些时候包括在两列中唯一键的列在MySQL多个已知的bug数不同。

here and here 在这里和这里

为什么mysql选择count（distinct user_id）返回错误的数字？

问题描述

1 个解决方案

解决方案1
1 2017-03-03 14:20:01

为什么mysql选择count（distinct user_id）返回错误的数字？

问题描述

1 个解决方案

解决方案1 1 2017-03-03 14:20:01

解决方案1
1 2017-03-03 14:20:01