简体   繁体   中英

SQL CODE to do: Trending topics equals a twitter

If i wants count the matching words in a rows of two tables, with milions of rows, sample:

Table posts, sample:

+----+---------+-----------------------------+
| ID | ID_user | text                        | 
+----+---------+-----------------------------+
| 1  | bruno   | michael jackson is dead     |
| 2  | thomasi | michael j. moonwalk is dead |
| 3  | userts  | michael jackson lives       |
+----+---------+-----------------------------+

i want query the words most repeated on the table, limit top 10, the result may be this:

+-------+------------+
| count | word       |
+-------+------------+
| 3     | michael    |
| 2     | dead       |
| 2     | jackson    |
| 1     | j.         |
| 1     | lives      |
| 1     | moonwalk   |
+-------+------------+

but i want search only words that repeat more of 10 times, in this case noone word is appear, but if criteria for repetead words is 2, will display only 'michael' and 'dead', but ignore 'is' because i dont want words with less 2 chars of lenght, and the words that a phrase, then i need apear this:

+-------+-----------------+
| count | word            |
+-------+-----------------+
| 2     | michael jackson |
| 2     | dead            |
+-------+-----------------+

i need a code in mysql that replies the "trending topics" of twitter for posts of my site.

What you're looking for is term extraction , which isn't provided natively within MySQL.

Some other platforms provide that function, but it's considered an enterprise feature, so you'll have to pay through the nose for it.

Alternatively, you can use something like Yahoo!'s Term Extraction API .

Here is a blog post that talks about using Yahoo!'s service from PHP5.

break the sentence up on insert, filter the words against a blacklist, store distinct words with a count (or probably with references). count using count() :)

this would generate a lot of data tough, and i don't know what the speed and storage implications are.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM