简体   繁体   中英

clustering of tweets using k means algorithm as positive or negative

i have some movie reviews, i need to cluster them on the basis of positive or negative clusters. Using Kmeans is possible. Can anyone give me basic outline of how to start with it. In Python is preferable.

you cannot cluster "as positive or negative"

You have labels. Use classification .

k-means will not be able to identify what is "positive". It may find any pattern, eg short vs. long, english vs. spanish tweets etc. - if you are lucky you can identify what it did.

You can start with sklearn package, one of well-known machine learning package. There you can use sklearn.cluster.KMeans.

Here is an exmaple from scikit-learn website .

Though you prefer python, R is also a good statistical tool that can do this. There is a function kmeans(x, centers) . It is builtin function, hence You donot need to import any package. What you need to do are read data and run it:

x = read.table(file,sep='\\t')

y = keman(x, centers=2)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM