[英]How to remove special character from data frame
I have imported data from a url and converted it to a data frame using the following code: 我已从网址导入数据,并使用以下代码将其转换为数据框:
url <-"http://apims.doe.gov.my/v2/hourly2.php"
tables<- readHTMLTable(url)
try<-do.call(rbind, lapply(tables, data.frame, stringsAsFactors=FALSE))
The data has '*' next to the numbers. 数据旁边的数字带有“ *”。 I would like to isolate the numbers only.
我只想隔离数字。 So instead of
所以代替
52* 45* 67* 55*
I have 我有
52 45 67 55
I have tried several methods to get the * special character out of 3rd through 8th columns and change the column to a numeric but since this character also has a meaning in R these are not working. 我尝试了几种方法来从第3列到第8列中获取*特殊字符并将该列更改为数字,但是由于该字符在R中也具有含义,因此无法使用。 I have tried:
我努力了:
x <- "~!@#$%^&*"
str_replace_all(x, as.character(try[,3:8]), " ")
I have also tried: 我也尝试过:
gsub("*","",try[,3:8])
The only function that has identified the * characters correctly is grep and grapl but I need another function that will use the grep output to remove the '*' special character. 唯一可以正确识别*字符的函数是grep和grapl,但是我需要另一个函数,该函数将使用grep输出删除'*'特殊字符。
grep('*',try)
Try this: 尝试这个:
dat<-do.call(rbind, lapply(tables, data.frame, stringsAsFactors=FALSE))
dat[, -(1:2)] <- sapply(dat[, -(1:2)], function(col) {
as.numeric(sub("[*]$", "", col))
})
head(dat)
# NEGERI...STATE KAWASAN.AREA MASA.TIME06.00AM MASA.TIME07.00AM MASA.TIME08.00AM MASA.TIME09.00AM MASA.TIME10.00AM MASA.TIME11.00AM
# NULL.1 Johor Kota Tinggi 52 53 52 50 50 49
# NULL.2 Johor Larkin Lama 51 51 51 NA 51 51
# NULL.3 Johor Muar 45 45 45 45 45 45
# NULL.4 Johor Pasir Gudang 56 56 55 56 56 56
# NULL.5 Kedah Alor Setar 50 50 50 50 50 49
# NULL.6 Kedah Bakar Arang, Sg. Petani NA NA NA NA NA NA
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.