简体   繁体   English

如何使用MySQL进行分组和计数

[英]how to group by and count using MySQL

I have data which looks like this: 我有看起来像这样的数据:

ID  post_author post_title  guid
3309    21  Should somebody not yet on SQL 2008 wait for SQL 2008 R2, since it's near release?  http://sql.stackexchange.com/questions/379/should-somebody-not-yet-on-sql-2008-wait-for-sql-2008-r2-since-its-near-release
1695    429 How do we politely decline well meaning advice from the Grandmother?    http://moms4mom.stackexchange.com/questions/1208/how-do-we-politely-decline-well-meaning-advice-from-the-grandmother
556 173 Books on how to be a great dad  http://moms4mom.stackexchange.com/questions/1042/books-on-how-to-be-a-great-dad
160 30  Building an ice hockey net cam  http://photo.stackexchange.com/questions/8/building-an-ice-hockey-net-cam
159 30  Generic commercial photo release form   http://photo.stackexchange.com/questions/4/generic-commercial-photo-release-form

I need to create a query that groups the data on part of the GUID field (the root URL) and counts the POST_AUTHOR for each. 我需要创建一个查询,该查询将GUID字段(根URL)的一部分数据分组,并为每个计数POST_AUTHOR。

The result I am looking for would be like this: 我正在寻找的结果将是这样的:

Site    Count of Authors
http://sql.stackexchange.com    1
http://moms4mom.stackexchange.com   2
http://photo.stackexchange.com  2

I would be grateful if someone help me construct the sql. 如果有人帮助我构造sql,我将不胜感激。

SELECT COUNT(POST_AUTHOR) AS AUTHOR_COUNT, GUID FROM TABLE_NAME GROUP BY GUID

It may be possible to construct such a query but will be not optimized. 可能会构造这样的查询,但不会进行优化。

You should add a column to your table which will have an ID of the site. 您应该在表中添加一列,该列将具有站点的ID。 Then add a new table which will have a preparsed data for the site: domain, path, resource, whether http or https, etc 然后添加一个新表,该表将具有站点的预备数据:域,路径,资源,http还是https等

This way you can be more flexible in searches and will be much faster, since I assume you have few inserts and large number of reads. 这样,您就可以更加灵活地进行搜索,并且速度会更快,因为我假设您插入的次数很少,读取次数很多。

Write a SQL FUNCTION - call it for instance, guid_extract(guid), which extracts the pertinent info, then you can add it to a column in your select:: 编写一个SQL函数-调用它,例如guid_extract(guid),它提取相关信息,然后可以将其添加到选择的列中:

SELECT stuff, otherstuff, guid_extract(guid) as site
  ...
  GROUP BY site;

The problem is how to extract the root part of the URL. 问题是如何提取URL的根部分。 If we can be sure that every URL will have at least 3 slashes, this will work, using substring_index 如果我们可以确保每个URL至少包含3个斜杠,则可以使用substring_index来工作

select substring_index(guid,'/',3) as site, count(id) as authors from table
group by substring_index(guid,'/',3) 

Of course, if you add an extra column with the site only at insert time, everything will be faster, cleaner and safer (you won't have to complexify the query to handle guids with only two slashes) 当然,如果仅在插入时在网站上添加额外的列,则一切将变得更快,更干净和更安全(您不必复杂化查询以仅使用两个斜杠来处理guid)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM