简体繁体中英

Representation and a good similarity measure between Tweets for topic detection

原文 2013-02-06 10:06:49 7 1 twitter/ machine-learning/ cluster-analysis/ information-retrieval/ topic-modeling

I'm planning to write a tool for Topic Detection on Twitter . I've been thinking about a good similarity measure (distance) between two tweets , and how to represent them, taking in count:

The #hashtags (I think hashtags are very important when detecting topics on Twitter)
The replies (if someone replies to a tweet , those tweets could be talking about the same topic, although two people could start talking about samsung galaxy and end talking about iphone jailbreaking , etc.)

I'm thinking about implementing what I have so far and do some experiments. I'll implement the classic models (like TF*IDF and use the euclidian distance , angle cosine , etc.), and the boolean models with a few similarity measures ( Hamming , Jaccard , etc.).

Any ideas of how to adapt some existing model to Twitter or a few ideas about how to create a new one?

1 answers

Similarity Metrics on Twitter discusses some details about the different similarity measures that you can use for clustering data from twitter together. We did some research on clustering users on twitter based on the user connections, user mentions, geo-location, the content similarity between tweets, content similarity between user descriptions and the common #hashtags.

For finding common topics on twitter, finding connections between the users discussing about the topics really helps and we found that group of users tend to discuss a common topic. There is some detail about this in the second half of this post .

removing tweets with partial similarity

Measure how hot a topic is on Twitter

Removing common junks from tweets for topic modeling

Tweets scraping - how to measure tweeting intensity?

Jaccard distance between tweets

What's a good set of heuristics for threading tweets?

Is it possible to get the most popular tweets in a topic using the Twitter API?

Display tweets between two dates

Are there any good JQuery twitter widgets which loop over tweets?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Measuring similarity between short texts such as tweets removing tweets with partial similarity Measure how hot a topic is on Twitter Removing common junks from tweets for topic modeling Tweets scraping - how to measure tweeting intensity? Jaccard distance between tweets What's a good set of heuristics for threading tweets? Is it possible to get the most popular tweets in a topic using the Twitter API? Display tweets between two dates Are there any good JQuery twitter widgets which loop over tweets?

Related Tags

Representation and a good similarity measure between Tweets for topic detection

Question

1 answers

solution1 5 2013-02-06 11:48:22

solution1
5 2013-02-06 11:48:22