简体   繁体   中英

How to map/plot new data onto a trained SOM map?

After training a SOM, how can you plot new data onto the SOM and visualise how it maps onto the SOM? Ideally, I would like for it to be plotted with the corresponding classification colour and node location. identify() has the capability of pinpointing data based on selections on the SOM map but it is very limited and can only do one at a time. I would like to map a whole (new) dataset and visualise it. I am able to get the node location from using map() and the group association, but how can I manually plot the new points onto the SOM? Couldn't find anything pertinent on the internet or the kohonen R documentation. Appreciate any help.

library(kohonen)
data(wines)
wines.train<-wines[1:150,]
wines.test<-wines[151:nrow(wines),]
wines.sc <- scale(wines.train)

set.seed(7)
wines.som<-som(wines.sc, grid = somgrid(5, 4, "hexagonal"),rlen=150,alpha=c(0.05,0.01))
wines.hc<-cutree(hclust(dist(wines.som$codes[[1]])),6) 
plot(wines.som,type="mapping",bgcol=rainbow(6)[wines.hc])
add.cluster.boundaries(wines.som,wines.hc)

can be used to manually inspect specific nodes on SOM

identify(wines.som$grid$pts,labels=as.vector(wines.hc),plot=T,pos=T) 

map new data onto trained SOM

wines.map<-map(wines.som,scale(wines.test))
wines.test.grp<-sapply(wines.map$unit.classif,function(x) wines.hc[[x]])

In my opinion, one thing to note is that you should not scale your test data using value inside of it. You should scale your test data using scaling parameter of your train data. Because the model was trained using information from the train data. It has not seen the test data.

So your scaled test data would be like this:

wines.test.scale <- scale(wines.test, center = attr(wines.sc, 'scaled:center'), scale = attr(wines.sc, 'scaled:scale'))

Now you can assign a new member to your model. This is a distance measurement of each data to every model's node. Because you split your data to train and test, there can be two new members added to your model, ie the train distance and the test distance. I give them names train.map and test.map, since this process can be regarded as a mapping process of input data to the model's map.

wines.som$train.map <- apply(
  wines.sc, 1, function(input1) {
    apply(
      wines.som$codes[[1]], 1, function(input2) dist(rbind(input1, input2))
    )
  }
)

wines.som$test.map <- apply(
  wines.test.scale, 1, function(input1) {
    apply(
      wines.som$codes[[1]], 1, function(input2) dist(rbind(input1, input2))
    )
  }
)

I think one must put the variable into the model because once the kernel attached the library, it overrode the base plot function with that of the package until one detach the package. The new plot function must recognize that the variable being processed has a proper class.

Now you can plot the map of your individual input data to the model's network. You can put two stages here: the train data mapping and the test data mapping.

par(mfrow = c(5,5))
for (a in 1:ncol(wines.som$train.map)) {
  plot(
    wines.som, type = 'property', property = wines.som$train.map[,a],
    main = paste('train',a) 
  )
}

par(mfrow = c(5,5))
for (a in 1:ncol(wines.som$test.map)) {
  plot(
    wines.som, type = 'property', property = wines.som$test.map[,a],
    main = paste('test',a) 
  )
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM