简体   繁体   中英

Clustering using lat/lon data in R

I am very new to R.Currently I am doing cluster analysis using latitude and longitude data then plot the value in google map. But my data point is very much limited...only 20 points. As per my knowledge I want to do it in using k-means algo and for distance calculation purpose I want to use Haversian distance ( https://www.slideshare.net/AnbarasanS2/clusteranalysis-58192369).I also trying Density Based Clustering but gives me very poor result.So,I want to stay with k-means.My dataset and code is given below -

1   27.9745 79.0028
2   29.4716 77.7642
3   30.9688 76.5256
4   29.4716 77.7642
5   29.4716 77.7642
6   29.4716 77.7642
7   29.4716 77.7642
8   25.5648 83.4477
9   26.2946 79.041
10  22.5293 77.178
11  26.2946 79.041
12  30.7896 76.4973
13  26.2946 79.041
14  28.1856 72.2447
15  28.1856 72.2447
16  28.1856 72.2447
17  28.1856 72.2447
18  28.1856 72.2447
19  28.1856 72.2447
20  28.1856 72.2447

Code is -

geodata = read.csv('test.csv')

#K-means clustering
#Compute the distance matrix using Geosphere package.
geo.dist <- function(df) {
  require(geosphere)
  d <- function(i,z) {
    dist <-rep(0,nrow(z))
    dist[i:nrow(z)] <-
      distHaversine(z[i:nrow(z),1:2],z[i,1:2])
    return(dist)
  }
  dm <- do.call(cbind,lapply(1:nrow(df), d, df))
  return(as.dist(df))
}

distance.matrix <-geo.dist(geodata[,c(2,3)])

#Determine the no.of clusters
wssplot.distancematrix <- function(data, nc = 15, seed = 1234) {
  wss <-rep(0,15)
  for (i in 2:nc) {
    set.seed(seed)
    wss[i] <- sum(kmeans(data, centers = i)$withinss)
  }
  plot(1:nc,wss,
       type = "b")
}

wssplot.distancematrix(distance.matrix)

But got this error -

Error in dimnames(df) <- if (is.null(labels)) list(seq_len(size), seq_len(size)) else list(labels, : length of 'dimnames' [1] not equal to array extent In addition: Warning message: In df[row(df) > col(df)] <- x :

Show Traceback

Rerun with Debug Error in dimnames(df) <- if (is.null(labels)) list(seq_len(size), seq_len(size)) else list(labels, : length of 'dimnames' [1] not equal to array extent

How to create k-means clustering and plot the values in google map.

Thanks in advance.

Regards, Nikita

You have two errors in code. Commented below:

geo.dist <- function(df) {
  require(geosphere)
  d <- function(i,z) {
    dist <-rep(0,nrow(z))
    dist[i:nrow(z)] <-
      distHaversine(z[i:nrow(z),1:2],z[i,1:2])
    return(dist)
  }
  dm <- do.call(cbind,lapply(1:nrow(df), d, df))
  return(as.dist(dm)) # return should be dm not df
}

distance.matrix <-geo.dist(geodata[,c(2,3)])

#Determine the no.of clusters
wssplot.distancematrix <- function(data, nc = 8, seed = 1234) {
  wss <-rep(0,nc) # nc = 15 is too high, to many cluster centers
  for (i in 2:nc) {
    set.seed(seed)
    wss[i] <- sum(kmeans(data, centers = i)$withinss)
  }
  plot(1:nc,wss,
       type = "b")
}

wssplot.distancematrix(distance.matrix)

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM