在两列上使用COUNT和GROUP BY进行极慢的SQL查询

Question

I'm archiving this web forum, which normally gets purged about once a week. 我正在存档这个网络论坛，通常每周清理一次。 So I'm screen scraping it, and storing it into my database (PostgreSQL). 所以我屏幕抓取它，并将其存储到我的数据库（PostgreSQL）。

I also do a little analysis on the data, with some graphs for users to enjoy, like what time of day is the forum most active, and so forth. 我还对数据进行了一些分析，其中有一些图表供用户欣赏，比如论坛最活跃的时间，等等。

So I have a posts table, like so: 所以我有一个帖子表，如下：

   Column   |            Type
------------+------------------------------
 id         | integer
 body       | text
 created_at | timestamp without time zone
 topic_id   | integer
 user_name  | text
 user_id    | integer

And I now want to have a post count for each user, for my little top 10 posters table. 我现在想要为每个用户提供一个帖子计数，用于我的小十大海报表。

I came up with this: 我想出了这个：

SELECT user_id, user_name, count(*)
FROM posts
GROUP BY user_id, user_name
ORDER BY count DESC LIMIT 10

Which turns out to be very slow. 结果证明非常慢。 9 seconds, with just about 300 000 rows in the posts table at the moment. 9秒，目前在帖子表中只有大约30万行。

It takes only half a second, if I group on just one column, but I need both. 如果我只分组一列，它只需要半秒钟，但我需要两个。

I'm rather new to relational databases, and SQL, so I'm not quite sure if this is right, or just how am I doing it wrong? 我对关系数据库和SQL很新，所以我不太确定这是不对的，或者我怎么做错了？

Answer 1

There's probably only one user with a particular ID, so max(user_name) should equal user_name . 可能只有一个用户具有特定ID，因此max(user_name)应该等于user_name 。 Then you can group on a single column, which your post indicates works faster: 然后，您可以对单个列进行分组，您的帖子表明其工作速度更快：

SELECT user_id, max(user_name), count(*)
FROM posts
GROUP BY user_id

Answer 2

也可以使用count> 0，所以你只返回true

在两列上使用COUNT和GROUP BY进行极慢的SQL查询

问题描述

2 个解决方案

解决方案1
11 已采纳 2010-02-20 17:22:51

解决方案2
0 2010-02-20 17:59:55

在两列上使用COUNT和GROUP BY进行极慢的SQL查询

问题描述

2 个解决方案

解决方案1 11 已采纳 2010-02-20 17:22:51

解决方案2 0 2010-02-20 17:59:55

解决方案1
11 已采纳 2010-02-20 17:22:51

解决方案2
0 2010-02-20 17:59:55