简体   繁体   English

SQL查询以一个输出计数多个字符串

[英]SQL query for counting multiple strings with one output

I have a database including certain strings, such as '{TICKER|IBM}' to which I will refer as ticker-strings. 我有一个包含某些字符串的数据库,例如“ {TICKER | IBM}”,我将其称为股票报价字符串。 My target is to count the amount of ticker-strings per day for multiple strings. 我的目标是每天计算多个字符串的自动报价字符串的数量。

My database table 'tweets' includes the rows 'tweet_id', 'created at' (dd/mm/yyyy hh/mm/ss) and 'processed text'. 我的数据库表“ tweets”包括行“ tweet_id”,“创建于”(dd / mm / yyyy hh / mm / ss)和“已处理文本”。 The ticker-strings, such as '{TICKER|IBM}', are within the 'processed text' row. 标记字符串,例如“ {TICKER | IBM}”,位于“已处理的文本”行中。

At this moment, I have a working SQL query for counting one ticker-string (thanks to the help of other Stackoverflow-ers). 目前,我有一个有效的SQL查询来计算一个报价字符串(由于其他Stackoverflow-ers的帮助)。 What I would like to have is a SQL query in which I can count multiple strings (next to '{TICKER|IBM}' also '{TICKER|GOOG}' and '{TICKER|BAC}' for instance). 我想拥有一个SQL查询,其中我可以计算多个字符串(例如,“ {TICKER | IBM}”旁边的还有“ {TICKER | GOOG}”和“ {TICKER | BAC}”旁边)。

The working SQL query for counting one ticker-string is as follows: 用于计算一个报价字符串的有效SQL查询如下:

SELECT d.date, IFNULL(t.count, 0) AS tweet_count
FROM all_dates AS d
LEFT JOIN (
    SELECT COUNT(DISTINCT tweet_id) AS count, DATE(created_at) AS date
    FROM tweets
    WHERE processed_text LIKE '%{TICKER|IBM}%'
    GROUP BY date) AS t
ON d.date = t.date

The eventual output should thus give a column with the date, a column with {TICKER|IBM}, a column with {TICKER|GOOG} and one with {TICKER|BAC}. 因此,最终的输出应为日期提供一列,{TICKER | IBM}为一列,{TICKER | GOOG}为一列,而{TICKER | BAC}为一列。

I was wondering whether this is possible and whether you have a solution for this? 我想知道这是否可能,您是否对此有解决方案? I have more than 100 different ticker-strings. 我有100多个不同的置顶字符串。 Of course, doing them one-by-one is an option, but it is a very time-consuming one. 当然,一个接一个地做是一种选择,但这是非常耗时的。

If I understand correctly, you can do this with conditional aggregation: 如果我理解正确,则可以使用条件聚合来做到这一点:

SELECT d.date, coalesce(IBM, 0) as IBM, coalesce(GOOG, 0) as GOOG, coalesce(BAC, 0) AS BAC
FROM all_dates d LEFT JOIN
     (SELECT DATE(created_at) AS date,
             COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|IBM}%' then tweet_id
                   END) as IBM,
             COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|GOOG}%' then tweet_id
                   END) as GOOG,
             COUNT(DISTINCT CASE WHEN processed_text LIKE '%{TICKER|BAC}%' then tweet_id
                   END) as BAC
      FROM tweets
      GROUP BY date
     ) t
     ON d.date = t.date;

I'd return the specified resultset like this, adding expressions to the SELECT list for each "ticker" I want returned as a separate column: 我将以这种方式返回指定的结果集,将要作为单独列返回的每个“行情指示器”添加表达式到SELECT列表中:

   SELECT d.date
        , IFNULL(SUM(t.processed_text LIKE '%{TICKER|IBM}%' ),0) AS `cnt_ibm`
        , IFNULL(SUM(t.processed_text LIKE '%{TICKER|GOOG}%'),0) AS `cnt_goog`
        , IFNULL(SUM(t.processed_text LIKE '%{TICKER|BAC}%' ),0) AS `cnt_goog`
        , IFNULL(SUM(t.processed_text LIKE '%{TICKER|...}%' ),0) AS `cnt_...`
     FROM all_dates d
     LEFT
     JOIN tweets t
       ON t.created_at >= d.date
      AND t.created_at < d.date + INTERVAL 1 DAY
    GROUP BY d.date

NOTES: The expressions within the SUM aggregates above are evaluated as booleans, so they return 1 (if true), 0 (if false), or NULL. 注意:上面的SUM聚合中的表达式被评估为布尔值,因此它们返回1(如果为true),0(如果为false)或NULL。 I'd avoid wrapping the created_at column in a DATE() function, and use a range scan instead, especially if a predicate is added ( WHERE clause) that restricts the values of date being returned from all_dates`. 我要避免将created_at列包装在DATE()函数中,而应使用范围扫描,尤其是如果添加了谓词( WHERE clause) that restricts the values of being returned from all_dates being returned from WHERE clause) that restricts the values of日期WHERE clause) that restricts the values of

As an alternative, expressions like this will return an equivalent result: 或者,这样的表达式将返回等效结果:

     , SUM(IF(t.process_text LIKE '%{TICKER|IBM}%' ,1,0)) AS `cnt_ibm`

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM