[英]SQL query for counting multiple strings with one output
I have a database including certain strings, such as '{TICKER|IBM}' to which I will refer as ticker-strings. 我有一个包含某些字符串的数据库,例如“ {TICKER | IBM}”,我将其称为股票报价字符串。 My target is to count the amount of ticker-strings per day for multiple strings.
我的目标是每天计算多个字符串的自动报价字符串的数量。
My database table 'tweets' includes the rows 'tweet_id', 'created at' (dd/mm/yyyy hh/mm/ss) and 'processed text'. 我的数据库表“ tweets”包括行“ tweet_id”,“创建于”(dd / mm / yyyy hh / mm / ss)和“已处理文本”。 The ticker-strings, such as '{TICKER|IBM}', are within the 'processed text' row.
标记字符串,例如“ {TICKER | IBM}”,位于“已处理的文本”行中。
At this moment, I have a working SQL query for counting one ticker-string (thanks to the help of other Stackoverflow-ers). 目前,我有一个有效的SQL查询来计算一个报价字符串(由于其他Stackoverflow-ers的帮助)。 What I would like to have is a SQL query in which I can count multiple strings (next to '{TICKER|IBM}' also '{TICKER|GOOG}' and '{TICKER|BAC}' for instance).
我想拥有一个SQL查询,其中我可以计算多个字符串(例如,“ {TICKER | IBM}”旁边的还有“ {TICKER | GOOG}”和“ {TICKER | BAC}”旁边)。
The working SQL query for counting one ticker-string is as follows: 用于计算一个报价字符串的有效SQL查询如下:
SELECT d.date, IFNULL(t.count, 0) AS tweet_count
FROM all_dates AS d
LEFT JOIN (
SELECT COUNT(DISTINCT tweet_id) AS count, DATE(created_at) AS date
FROM tweets
WHERE processed_text LIKE '%{TICKER|IBM}%'
GROUP BY date) AS t
ON d.date = t.date
The eventual output should thus give a column with the date, a column with {TICKER|IBM}, a column with {TICKER|GOOG} and one with {TICKER|BAC}. 因此,最终的输出应为日期提供一列,{TICKER | IBM}为一列,{TICKER | GOOG}为一列,而{TICKER | BAC}为一列。
I was wondering whether this is possible and whether you have a solution for this? 我想知道这是否可能,您是否对此有解决方案? I have more than 100 different ticker-strings.
我有100多个不同的置顶字符串。 Of course, doing them one-by-one is an option, but it is a very time-consuming one.
当然,一个接一个地做是一种选择,但这是非常耗时的。
If I understand correctly, you can do this with conditional aggregation: 如果我理解正确,则可以使用条件聚合来做到这一点:
SELECT d.date, coalesce(IBM, 0) as IBM, coalesce(GOOG, 0) as GOOG, coalesce(BAC, 0) AS BAC
FROM all_dates d LEFT JOIN
(SELECT DATE(created_at) AS date,
COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|IBM}%' then tweet_id
END) as IBM,
COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|GOOG}%' then tweet_id
END) as GOOG,
COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|BAC}%' then tweet_id
END) as BAC
FROM tweets
GROUP BY date
) t
ON d.date = t.date;
I'd return the specified resultset like this, adding expressions to the SELECT list for each "ticker" I want returned as a separate column: 我将以这种方式返回指定的结果集,将要作为单独列返回的每个“行情指示器”添加表达式到SELECT列表中:
SELECT d.date
, IFNULL(SUM(t.processed_text LIKE '%{TICKER|IBM}%' ),0) AS `cnt_ibm`
, IFNULL(SUM(t.processed_text LIKE '%{TICKER|GOOG}%'),0) AS `cnt_goog`
, IFNULL(SUM(t.processed_text LIKE '%{TICKER|BAC}%' ),0) AS `cnt_goog`
, IFNULL(SUM(t.processed_text LIKE '%{TICKER|...}%' ),0) AS `cnt_...`
FROM all_dates d
LEFT
JOIN tweets t
ON t.created_at >= d.date
AND t.created_at < d.date + INTERVAL 1 DAY
GROUP BY d.date
NOTES: The expressions within the SUM
aggregates above are evaluated as booleans, so they return 1 (if true), 0 (if false), or NULL. 注意:上面的
SUM
聚合中的表达式被评估为布尔值,因此它们返回1(如果为true),0(如果为false)或NULL。 I'd avoid wrapping the created_at
column in a DATE() function, and use a range scan instead, especially if a predicate is added ( WHERE clause) that restricts the values of
date being returned from
all_dates`. 我要避免将
created_at
列包装在DATE()函数中,而应使用范围扫描,尤其是如果添加了谓词( WHERE clause) that restricts the values of
being returned from
all_dates being returned from
WHERE clause) that restricts the values of
日期WHERE clause) that restricts the values of
。
As an alternative, expressions like this will return an equivalent result: 或者,这样的表达式将返回等效结果:
, SUM(IF(t.process_text LIKE '%{TICKER|IBM}%' ,1,0)) AS `cnt_ibm`
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.