K-means clustering in R

Question

I'm a beginner in R and I followed this tutorial on K-means clustering. However, I'm trying to run this algorithm on real data. I chose : http://exoplanet.eu/catalog/

I have loaded data :

d <- read.csv2(
    "exoplanet.eu_catalog.csv",
    header = TRUE,
    sep = ","
)

With this code :

plot(
    x = log(as.numeric(as.character(d$semi_major_axis))),
    y = log(as.numeric(as.character(d$mass))),
    xlab = "Star-exoplanet distance (log(UA))",
    ylab = "Mass of exoplanets (log(M[Jupiter]))"
)

I have the following graphic :

I'd like to run the K-means clustering algorithm on this graphic to show three clusters with colors but I don't know how to proceed in R. I suppose I have to begin with :

y = log(as.numeric(as.character(d$mass)))
y <- y[!is.na(y)]
x = log(as.numeric(as.character(d$semi_major_axis)))
x <- x[!is.na(x)]

But I don't know how to format data into a matrix in order to run kmeans(matrix, 3, nstart = 20) . Any clue please ?

Answer 1

Since you read your file using

d <- read.csv2("exoplanet.eu_catalog.csv",
header = TRUE,
sep = ",")

Your data is in the form of data frame and you need to convert as a matrix

Use this code to convert a data frame into matrix

inMatrixForm <- data.matrix(d)

K-means clustering in R

Question

1 answers

solution1
0 ACCPTED 2017-03-15 08:44:31

K-means clustering in R

Question

1 answers

solution1 0 ACCPTED 2017-03-15 08:44:31

solution1
0 ACCPTED 2017-03-15 08:44:31