简体   繁体   中英

why NAs introduced by coercion happen when I Convert data type FACTOR to data frame?

I have a data frame with 30 row and 100 column. Some column of this data has "nan" and "inf" value. For instant, I create a sample of my data frame like this

 test<-data.frame(a=c("inf",1,"inf"),b=c("nan",3,"nan")) row.names(test)<-c("w1","w2",w) 

when I wanted to change inf and nan to zero I try such a these codes

 na_codes<-"inf|nan" test<-apply(test, 2, function(x){ ifelse(x %in% na_codes, 0, x) } ) test<-as.data.frame(lapply(test, function(x) { levels(x)[levels(x) %in% na_code] <- 0 x }) ) 
but only with this code I have achieved the desirable output.

 test<-type.convert(sub("inf|nan", 0, as.matrix(test))) 

but the class of my data change to factor! when I want to normalize my data I used this code

 normalize<-function(x){ return((x-min(x))/(max(x)-min(x))) } 

 norm_test<-sapply(data.frame(test),normalize) 

it crash returning the following message:

  Error in Summary.factor(766L, na.rm = FALSE) : 'min' not meaningful for factor 

I want to convert the factor to the numeric class and so used this code

 norm_test<-sapply(data.frame(as.numeric(as.character(test))),normalize) 

unfortunately it also crash returning the following warrning

 Warning message: In data.frame(as.numeric(as.character(num_base))) : NAs introduced by coercion 

Actually, these codes work good for test sample that I have mentioned above and I face these errors with my data !!!!

I need to understand why it crash happens and how can I prevent this kind of errors.

Thanks a lot!

That seems like a very convoluted way of replacing NA s and Inf s. Unfortunately you don't share any sample data, nor do you provide details on function normalize , so I'm not sure how your data looks like.

In the following I assume that you have a matrix or data.frame with numeric values, and some entries that are NA or Inf .

How about this instead:

# Sample data
set.seed(2017);
df <- matrix(rnorm(20), ncol = 4);
df[2, 2] <- Inf;
df[3, 3] <- NA;

# Replace NAs and Infs with 0
df[is.na(df) | is.infinite(df)] <- 0;
df;
#            [,1]         [,2]       [,3]       [,4]
#[1,]  1.43420148  0.451905527  0.3427681  1.1944265
#[2,] -0.07729196  0.000000000  1.5724254 -0.4820681
#[3,]  0.73913723 -0.001524259  0.0000000  1.3178624
#[4,] -1.75860473 -0.265336001  0.3066498 -1.1298316
#[5,] -0.06982523  1.563222619 -1.4304858 -0.9263514

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM