I am very new to R.Currently I am doing cluster analysis using latitude and longitude data then plot the value in google map. But my data point is very much limited...only 20 points. As per my knowledge I want to do it in using k-means algo and for distance calculation purpose I want to use Haversian distance ( https://www.slideshare.net/AnbarasanS2/clusteranalysis-58192369).I also trying Density Based Clustering but gives me very poor result.So,I want to stay with k-means.My dataset and code is given below -
1 27.9745 79.0028
2 29.4716 77.7642
3 30.9688 76.5256
4 29.4716 77.7642
5 29.4716 77.7642
6 29.4716 77.7642
7 29.4716 77.7642
8 25.5648 83.4477
9 26.2946 79.041
10 22.5293 77.178
11 26.2946 79.041
12 30.7896 76.4973
13 26.2946 79.041
14 28.1856 72.2447
15 28.1856 72.2447
16 28.1856 72.2447
17 28.1856 72.2447
18 28.1856 72.2447
19 28.1856 72.2447
20 28.1856 72.2447
Code is -
geodata = read.csv('test.csv')
#K-means clustering
#Compute the distance matrix using Geosphere package.
geo.dist <- function(df) {
require(geosphere)
d <- function(i,z) {
dist <-rep(0,nrow(z))
dist[i:nrow(z)] <-
distHaversine(z[i:nrow(z),1:2],z[i,1:2])
return(dist)
}
dm <- do.call(cbind,lapply(1:nrow(df), d, df))
return(as.dist(df))
}
distance.matrix <-geo.dist(geodata[,c(2,3)])
#Determine the no.of clusters
wssplot.distancematrix <- function(data, nc = 15, seed = 1234) {
wss <-rep(0,15)
for (i in 2:nc) {
set.seed(seed)
wss[i] <- sum(kmeans(data, centers = i)$withinss)
}
plot(1:nc,wss,
type = "b")
}
wssplot.distancematrix(distance.matrix)
But got this error -
Error in dimnames(df) <- if (is.null(labels)) list(seq_len(size), seq_len(size)) else list(labels, : length of 'dimnames' [1] not equal to array extent In addition: Warning message: In df[row(df) > col(df)] <- x :
Show Traceback
Rerun with Debug Error in dimnames(df) <- if (is.null(labels)) list(seq_len(size), seq_len(size)) else list(labels, : length of 'dimnames' [1] not equal to array extent
How to create k-means clustering and plot the values in google map.
Thanks in advance.
Regards, Nikita
You have two errors in code. Commented below:
geo.dist <- function(df) {
require(geosphere)
d <- function(i,z) {
dist <-rep(0,nrow(z))
dist[i:nrow(z)] <-
distHaversine(z[i:nrow(z),1:2],z[i,1:2])
return(dist)
}
dm <- do.call(cbind,lapply(1:nrow(df), d, df))
return(as.dist(dm)) # return should be dm not df
}
distance.matrix <-geo.dist(geodata[,c(2,3)])
#Determine the no.of clusters
wssplot.distancematrix <- function(data, nc = 8, seed = 1234) {
wss <-rep(0,nc) # nc = 15 is too high, to many cluster centers
for (i in 2:nc) {
set.seed(seed)
wss[i] <- sum(kmeans(data, centers = i)$withinss)
}
plot(1:nc,wss,
type = "b")
}
wssplot.distancematrix(distance.matrix)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.