I have a MySQL table named "content"containing (ao) the fields "_date" and "text", for example:
_date text
---------------------------------------------------------
2011-02-18 I'm afraid my car won't start tomorrow
2011-02-18 I hope I'm going to pass my exams
2011-02-18 Exams coming up - I'm not afraid :P
2011-02-19 Not a single f was given this day
2011-02-20 I still hope I passed, but I'm afraid I didn't
2011-02-20 On my way to school :)
I'm looking for a query to count the number of times the words "hope" and "afraid" are being used per day. In other words, the output would have to be something like:
_date word count
-----------------------
2011-02-18 hope 1
2011-02-18 afraid 2
2011-02-19 hope 0
2011-02-19 afraid 0
2011-02-20 hope 1
2011-02-20 afraid 1
Is there an easy way to do this or should I just write I different query per term? I now have this, but I don't know what to put instead of "?"
SELECT COUNT(?) FROM content WHERE text LIKE '%hope' GROUP BY _date
Can somebody help met with the correct query for this?
I think the most easy and redable way is to make subquerys:
Select
_date, 'hope' as word,
sum( case when `text` like '%hope%' then 1 else 0 end) as n
from content
group by _date
UNION
Select
_date, 'afraid' as word,
sum( case when `text` like '%afraid%' then 1 else 0 end) as n
from content
group by _date
This approach has not the best performace. If you are looking for performance you should grouping in subquery by day, also this like
condition is a performance killer. This is a solution if you only execute the query in batch mode time by time. Explain your performance requeriments for an accurate solution.
EDITED TO MATCH LAST OP REQUERIMENT
Your query is almost correct:
SELECT _date, 'hope' AS word, COUNT(*) as count
FROM content WHERE text LIKE '%hope%' GROUP BY _date
use %hope%
to match the word anywhere (not only at the end of the string). COUNT(*)
should do what you want.
To get multiple words from a single query, use UNION ALL
Another approach is to create a sequence of words on the fly and use it as the second table in a join:
SELECT _date, words.word, COUNT(*) as count
FROM (
SELECT 'hope' AS word
UNION
SELECT 'afraid' AS word
) AS words
CROSS JOIN content
WHERE text LIKE CONCAT('%', words.word, '%')
GROUP BY _date, words.word
Note that it will only count a single occurrence of each word per sentence. So »I hope there is still hope« will only give you 1
, and not 2
To get 0
when there are no matches, join the previous result with the dates again:
SELECT content._date, COALESCE(result.word, 'no match'), COALESCE(result.count, 0)
FROM content
LEFT JOIN (
SELECT _date, words.word, COUNT(*) as count
FROM (
SELECT 'hope' AS word
UNION
SELECT 'afraid' AS word
) AS words
CROSS JOIN content
WHERE text LIKE CONCAT('%', words.word, '%')
GROUP BY _date, words.word ) AS result
ON content._date = result._date
Assuming you want to count all words and find the most used words (rather than looking for the count of a few specific words) you might want to try something like the following stored procedure (string splitting compliments of this blog post ):
DROP PROCEDURE IF EXISTS wordsUsed;
DELIMITER //
CREATE PROCEDURE wordsUsed ()
BEGIN
DROP TEMPORARY TABLE IF EXISTS wordTmp;
CREATE TEMPORARY TABLE wordTmp (word VARCHAR(255));
SET @wordCt = 0;
SET @tokenCt = 1;
contentLoop: LOOP
SET @stmt = 'INSERT INTO wordTmp SELECT REPLACE(SUBSTRING(SUBSTRING_INDEX(`text`, " ", ?),
LENGTH(SUBSTRING_INDEX(`text`, " ", ? -1)) + 1),
" ", "") word
FROM content
WHERE LENGTH(SUBSTRING_INDEX(`text`, " ", ?)) != LENGTH(`text`)';
PREPARE cmd FROM @stmt;
EXECUTE cmd USING @tokenCt, @tokenCt, @tokenCt;
SELECT ROW_COUNT() INTO @wordCt;
DEALLOCATE PREPARE cmd;
IF (@wordCt = 0) THEN
LEAVE contentLoop;
ELSE
SET @tokenCt = @tokenCt + 1;
END IF;
END LOOP;
SELECT word, count(*) usageCount FROM wordTmp GROUP BY word ORDER BY usageCount DESC;
END //
DELIMITER ;
CALL wordsUsed();
You might want to write another query (or procedure) or add some nested "REPLACE" statements to further remove punctuation from the resulting temp table of words, but this should be a good start.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.