简体   繁体   中英

Apply k-means to examine differences between two groups in R

I have two groups. The treatment group is exposure to media; the control group is no media. They are distinguished by a categorial variable in the data frame. (exposure to media = 1, no media = 0)

Now, I want to examine whether there are any clear differences between these two groups. To do this, apply the k-means algorithm with two clusters to four variables (proportion of black population, proportion of male population, proportion of hispanic population, median income on the logarithmic scale).

How to do this in R? Could anyone give some hints? Thanks!

Try this:

km <-kmeans(your data, 2, nstart=10)

your data here as a data.frame (your whole data or you can select the variables that you are interesting about them). You need to select the number of clusters (here is 2). A good practice to understand your data is to apply different number of cluster and then see which one fit your data better (use for example any criteria methods such as AIC or BIC).

k-means is an approach applied to cluster data. Where this data come from different distribution and we would like to know from where each observation come from (from which distribution).

You can also have a look at many tutorials about kmeans in R. For example,

https://onlinecourses.science.psu.edu/stat857/node/125

https://www.r-statistics.com/2013/08/k-means-clustering-from-r-in-action/

http://www.statmethods.net/advstats/cluster.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM