I'm trying to cluster MLB data by starting pitcher's name. I've combed through the data I'm using and there is nothing with a value of na and I omitted them in the code below. Clustdata looks completely good to me ClustData preview but I get this error:
NAs introduced by coercionError in hclust(d, method = "single", members = clustdata[, 1]): NA/NaN/Inf in foreign function call (arg 7)
I want to cluster that table by pitcher name by those attributes Anyone have any ideas? Thanks! I'm new to R
data7 = read.csv("GL2007.csv", header = T)
data8 = data.frame(na.omit(data7[c(10,23,24,25,26,30,31,33,105)]))
scoreagg = aggregate(v_score ~ h_starting_pitcher_name, data8, mean)
hitsagg = aggregate(v_hits ~ h_starting_pitcher_name, data8, mean)
doubagg = aggregate(v_doubles~ h_starting_pitcher_name, data8, mean)
tripagg = aggregate(v_triples~ h_starting_pitcher_name, data8, mean)
hragg = aggregate(v_homeruns ~ h_starting_pitcher_name, data8, mean)
hbpagg = aggregate(v_hit_by_pitch ~ h_starting_pitcher_name, data8, mean)
walksagg = aggregate(v_walks~ h_starting_pitcher_name, data8, mean)
SOagg = aggregate(v_strikeouts~ h_starting_pitcher_name, data8, mean)
clustdata = data.frame(scoreagg$h_starting_pitcher_name, scoreagg$v_score,hitsagg$v_hits,doubagg$v_doubles,tripagg$v_triples,hragg$v_homeruns,hbpagg$v_hit_by_pitch,walksagg$v_walks,SOagg$v_strikeouts)
library(NbClust)
d = dist(as.matrix(clustdata[,2:9]), method = "euclidean")
hc_1 = hclust(d, method = "single", members = clustdata[,1])
Since not a lot of details given in the question, it seems you are not using the members
argument correctly.
Just leave it as NULL if your aim is only to obtain a clustering.
hc_1 = hclust(d, method = "single")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.