[英]R - Remove values from different columns in a data frame
I have a dataset that contains in some columns two values that I have to change to NA. 我有一个数据集,在某些列中包含两个我必须更改为NA的值。
'#DIV/0' and '' (nothing) '#DIV / 0'和''(没有)
I solved this problem using a 'for' loop but I would like to know if there is another way, like using 'apply' and what is the faster method. 我使用'for'循环解决了这个问题,但我想知道是否有另一种方法,比如使用'apply',什么是更快的方法。
My code: 我的代码:
train <- read.csv('https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv',stringsAsFactors = F)
test <- read.csv('https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv', stringsAsFactors = F)
train2 <- train
for(x in 1:length(train2)){
train2[train2[,x] %in% c('','#DIV/0'),x] <- NA
}
test2 <- test
for(x in 1:length(test2)){
test2[test2[,x] %in% c('','#DIV/0'),x] <- NA
}
We can use na.strings
argument in the read.csv
我们可以在
read.csv
使用na.strings
参数
train <- read.csv('https://d396qusza40orc.cloudfront.net/predmachlearn/pml-training.csv',
na.strings=c('#DIV/0', '', 'NA') ,stringsAsFactors = F)
test <- read.csv('https://d396qusza40orc.cloudfront.net/predmachlearn/pml-testing.csv',
na.strings= c('#DIV/0', '', 'NA'),stringsAsFactors = F)
Just checking 只是检查
sum(train=='#DIV/0', na.rm=TRUE)
#[1] 0
sum(test=='#DIV/0', na.rm=TRUE)
#[1] 0
sum(test=='', na.rm=TRUE)
#[1] 0
sum(train=='', na.rm=TRUE)
#[1] 0
The NA
values NA
值
sum(is.na(train))
#[1] 1921600
sum(is.na(test))
#[1] 2000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.