简体   繁体   中英

Use ruby to create a list of commonly used words or phrases

Looking for some advice on generating a list of commonly used words and phrases from a bunch of entries in a nosql database. Basically we have a bunch of posts made by someone and we want to tell them "Hey there. You use these words / phrases a lot".

I'm a bit stumped on this one.

My application is ruby on rails, backbone-js and redis.

Since it's not clear how the posts are stored, I'll just assume you can get an array of all the posts.

A simple algorithm to find the most common uncommon words would be the following: Iterate over the array of all the posts, and then strip the post from anything but the words and split it into words. Go over all the words in the entry and add 1 to the amount of times you've seen that word. Once that's done for all the words in all your entries, you'll have a hash with the number of occurrences of all the words. Remove the most common words, here's an example of 100 common words . You should probably use more in your application. Sort them by the number of occurrences and you'll have the most commonly occurring words.

It's implemented here . It doesn't handle cases such as posts being post , which you might want. You could look into how Rails implements String#singular to get this behavior.

If you wanna find commonly used phrases it gets more interesting, you'd probably have to use some kind of natural language processing as @sawa pointed out in a comment. I can't come up with a solution that is fast enough off the top of my head.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM