简体   繁体   中英

Error in svd(x, nu = 0) : infinite or missing values in 'x' (checked no negative values exist)

I know this is a common error for PCA but I went through the solutions provided and its not working.

I followed: Error in svd(x, nu = 0) : 0 extent dimensions

Below is my code extract:

require(class)
set.seed(2095)
# dataset source:https://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
normalize<-function(x) {
  return ((x - min(x)) / (max(x) - min(x)))
}
dataset <- read.csv("data/kdd_data_10pc.csv", header = FALSE, sep = ",")
names   <- read.csv("data/kdd_names.csv", header = FALSE , sep = ";")
names(dataset) <- sapply((1:nrow(names)),function(i) toString(names[i, 1]))

# extracting relevant features
dataset_extracted <- dataset[, c("src_bytes", "dest_bytes", "count", "dst_host_count", "dst_host_same_srv_rate", "dst_host_serror_rate", "label")]

head(dataset_extracted, 3)

log.kdd   <-log(dataset_extracted[, 1:6])
kdd.label <- dataset_extracted[, 7]

kdd.pca <-prcomp(log.kdd,
             center = TRUE,
             scale. = TRUE)

Summary(dataset) output is as follow:

 summary(dataset_extracted)
   src_bytes           dest_bytes          count       dst_host_count  dst_host_same_srv_rate dst_host_serror_rate      label       
 Min.   :        0   Min.   :      0   Min.   :  0.0   Min.   :  0.0   Min.   :0.0000         Min.   :0.0000       smurf.  :280790  
 1st Qu.:       45   1st Qu.:      0   1st Qu.:117.0   1st Qu.:255.0   1st Qu.:0.4100         1st Qu.:0.0000       neptune.:107201  
 Median :      520   Median :      0   Median :510.0   Median :255.0   Median :1.0000         Median :0.0000       normal. : 97278  
 Mean   :     3026   Mean   :    869   Mean   :332.3   Mean   :232.5   Mean   :0.7538         Mean   :0.1768       back.   :  2203  
 3rd Qu.:     1032   3rd Qu.:      0   3rd Qu.:511.0   3rd Qu.:255.0   3rd Qu.:1.0000         3rd Qu.:0.0000       satan.  :  1589  
 Max.   :693375640   Max.   :5155468   Max.   :511.0   Max.   :255.0   Max.   :1.0000         Max.   :1.0000       ipsweep.:  1247  
                                                                                                                   (Other) :  3713  

Based on the summary none of the extracted columns minimum value are of any negative value.

I'm new to machine learning. Appreciate any help provided. The exact error shown was

Error in svd(x, nu = 0) : infinite or missing values in 'x'

You apply a log transformation to an object ( dataset ) containing zero values. This will produce elements of negative infinity. Try using log1p() instead.

Also don't forget to apply the standardisation you encode in the function normalize() .

Also also, given the magnitude of some of the outliers, I'm not sure a log transformation will be sufficient - you may need to consider excluding some observations.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM