简体   繁体   English

R som软件包Kohonen-更新示例到版本3

[英]R som package Kohonen - update example to version 3

I'm trying to get this example to work with version 3 of the Kohnonen R library. 我正在尝试使该示例与Kohnonen R库的版本3一起使用。 https://clarkdatalabs.github.io/soms/SOM_NBA https://clarkdatalabs.github.io/soms/SOM_NBA

I've tried to update the code there as came up with this, but it's not correct. 我曾尝试在那里更新代码,但这是不正确的。 I get most of the same results as the example, but in the last plot I can't see any errors in classification, so I'm doing something wrong. 我得到的结果与示例大致相同,但是在上一幅图中,我看不到任何分类错误,所以我做错了什么。 I think I know about where my mistake is, but I'm not sure what it might be. 我想我知道我的错误在哪里,但是我不确定这可能是什么。

# https://clarkdatalabs.github.io/soms/SOM_NBA
# https://github.com/clarkdatalabs/soms/issues?q=is%3Aopen+is%3Aissue


library(kohonen)
library(RColorBrewer)
library(RCurl)

NBA <- read.csv(text = getURL("https://raw.githubusercontent.com/clarkdatalabs/soms/master/NBA_2016_player_stats_cleaned.csv"), 
            sep = ",", header = T, check.names = FALSE)

colnames(NBA)

NBA.measures1 <- c("FTA", "2PA", "3PA")
NBA.SOM1 <- som(scale(NBA[NBA.measures1]), grid = somgrid(6, 4, "rectangular"))
plot(NBA.SOM1)

colors <- function(n, alpha = 1) {
rev(heat.colors(n, alpha))
}

plot(NBA.SOM1, type = "counts", palette.name = colors, heatkey = TRUE)

par(mfrow = c(1, 2))
plot(NBA.SOM1, type = "mapping", pchs = 20, main = "Mapping Type SOM")
plot(NBA.SOM1, main = "Default SOM Plot")

NBA.SOM2 <- som(scale(NBA[NBA.measures1]), grid = somgrid(6, 6, "hexagonal", toroidal=TRUE) )

par(mfrow = c(1, 2))
plot(NBA.SOM2, type = "mapping", pchs = 20, main = "Mapping Type SOM")
plot(NBA.SOM2, main = "Default SOM Plot")
plot(NBA.SOM2, type = "dist.neighbours", palette.name = terrain.colors)

NBA.measures2 <- c("FTA", "FT", "2PA", "2P", "3PA", "3P", "AST", "ORB", "DRB", 
               "TRB", "STL", "BLK", "TOV")

training_indices <- sample(nrow(NBA), 200)
NBA.training <- scale(NBA[training_indices, NBA.measures2])
NBA.testing <- scale(NBA[-training_indices, NBA.measures2], center = attr(NBA.training, 
"scaled:center"), scale = attr(NBA.training, "scaled:scale"))

NBA.SOM3 <- xyf(NBA.training, classvec2classmat(NBA$Pos[training_indices]), 
            grid = somgrid(13, 13, "hexagonal", toroidal = TRUE), rlen = 100, 
user.weights = 0.5)

pos.prediction <- predict(NBA.SOM3, newdata = NBA.testing, whatmap = 1)
table(NBA[-training_indices, "Pos"], pos.prediction$prediction[[2]])

NBA.SOM4 <- xyf(scale(NBA[, NBA.measures2]), classvec2classmat(NBA[, "Pos"]), 
            grid = somgrid(13, 13, "hexagonal", toroidal = TRUE), rlen = 300, 
user.weights = 0.7)

par(mfrow = c(1, 2))
plot(NBA.SOM4, type = "codes", main = c("Codes X", "Codes Y"))
NBA.SOM4.hc <- cutree(hclust(dist(getCodes(NBA.SOM4, 2))), 5)
add.cluster.boundaries(NBA.SOM4, NBA.SOM4.hc)

bg.pallet <- c("red", "blue", "yellow", "purple", "green")

# make a vector of just the background colors for all map cells

#I think my error is in this line...
position.predictions <- classmat2classvec(predict(NBA.SOM4)$unit.predictions[[2]])


base.color.vector <- bg.pallet[match(position.predictions, levels(NBA$Pos))]

# set alpha to scale with maximum confidence of prediction
bgcols <- c()
max.conf <- apply(getCodes(NBA.SOM4, 2), 1, max)
for (i in 1:length(base.color.vector)) {
  bgcols[i] <- adjustcolor(base.color.vector[i], max.conf[i])
}

par(mar = c(0, 0, 0, 4), xpd = TRUE)
plot(NBA.SOM4, type = "mapping", pchs = 21, col = "black", bg = 
bg.pallet[match(NBA$Pos, 
levels(NBA$Pos))], bgcol = bgcols)

legend("topright", legend = levels(NBA$Pos), text.col = bg.pallet, bty = "n", 
   inset = c(-0.03, 0))

The kohonen package builds the model by initializing its nodes property using some randomly selected training members. kohonen软件包通过使用一些随机选择的训练成员初始化其nodes属性来构建模型。 Therefor, it is very rarely one would get the exact final nodes arrangement with that of someone else does. 因此,很少有人会获得与其他人一样的确切最终节点排列。 Nevertheless, the property values will still be the same, only the arrangement is different. 尽管如此,属性值仍将相同,只是排列方式不同。 At least, that is what in my opinion. 至少,我认为是这样。 To obtain the exact arrangement, two kohonen models should be run under the same random seed number generator, ie using set.seed() function. 为了获得精确的排列,应该在同一随机种子数生成器下运行两个kohonen模型,即使用set.seed()函数。 From the code that you have already provided, the variable 'position.prediction' contains some NA values. 在您已经提供的代码中,变量“ position.prediction”包含一些NA值。 I think if you add one more line to omit the NA values after the assignment to the 'position.prediction', the nodes background would be all filled with an already predefined color palette. 我认为,如果在分配给“ position.prediction”后再增加一行以省略NA值,则节点背景将全部填充有已经预定义的调色板。 So the script will be: 因此,脚本将是:

# this is your script
position.predictions <- classmat2classvec(predict(NBA.SOM4)$unit.predictions[[2]])

# add this below and continue
position.predictions <- na.omit(position.predictions)

I think that the NA values are returned as a result of the inability of the kohonen to recognize the pattern of its inputs. 我认为,由于kohonen无法识别其输入模式,因此返回了NA值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM