Using the example dataframe:
df <- structure(list(
KY27PHY1 = c("4", "5", "5", "4", "-", "4", "2","3", "5", "-", "4", "3", "3", "5", "5"),
KY27PHY2 = c("4", "4","4", "4", "-", "5", "2", "3", "5", "-", "5", "3", "3", "5", "5"),
KY27PHY3 = c("5", "4", "4", "4", "-", "5", "1", "4", "5","-", "4", "3", "3", "5", "5")),
.Names = c("KY27PHY1", "KY27PHY2","KY27PHY3"),
row.names = 197:211,
class = "data.frame")
I have been using the following code to convert the values to numeric:
df$KY27PHY1<-as.numeric(df$KY27PHY1)
df$KY27PHY2<-as.numeric(df$KY27PHY2)
df$KY27PHY3<-as.numeric(df$KY27PHY3)
Since I have missing values in the df dataframe, I always get the warning message:
Warning message:
NAs introduced by coercion
I presume this isn't a problem, but I just wanted some advice of how I might improve the code so I don't get this message.
Also, how I can do all the columns (specified by name) in one go?
Many thanks in advance.
I see two possibilities:
the unlikely one is that you built your data.frame in R. Then, just change your code to create integer vectors in the first place, or replace -
with NA
so the as.numeric
conversion won't complain.
The more likely one is that your data.frame came from outside R and you probably read it with one of the read.table
or read.csv
functions. Then just add na.strings = "-"
to your call and R will know that these -
are to be understood as NA
. Also, if there are no other weird items in these columns, the type.convert
function called inside these functions will automatically detect that these are columns full of integers and store them as such.
data.table
is super fast, you should use it as soon as you work with data.frames
. for your question that would be :
library(data.table)
dt = as.data.table(df)
dt[,lapply(.SD,as.numeric)]
KY27PHY1 KY27PHY2 KY27PHY3
1: 4 4 5
2: 5 4 4
3: 5 4 4
4: 4 4 4
5: NA NA NA
6: 4 5 5
7: 2 2 1
8: 3 3 4
9: 5 5 5
10: NA NA NA
11: 4 5 4
12: 3 3 3
13: 3 3 3
14: 5 5 5
15: 5 5 5
Off course you get some warnings as "-" cannot be converted to a number
You can use sapply
to do them all at once, but you will end up with a matrix
so you have to wrap in an as.data.frame
to convert back. The warnings are just there to tell you that there were characters in your original data that could not be matched to a number, so were replaced by NA
. In your case these characters were "-"
. To ensure the warnings do not print, use suppressWarnings
:
suppressWarnings(as.data.frame(sapply(df,as.numeric)))
KY27PHY1 KY27PHY2 KY27PHY3
1 4 4 5
2 5 4 4
3 5 4 4
4 4 4 4
5 NA NA NA
6 4 5 5
7 2 2 1
8 3 3 4
9 5 5 5
10 NA NA NA
11 4 5 4
12 3 3 3
13 3 3 3
14 5 5 5
15 5 5 5
I wrote a small function some time back to handle making certain values in a data.frame
as NA
and using type.convert
to convert the output, as if you had used read.table
with na.strings
specified.
Here's the function:
makemeNA <- function(mydf, NAStrings, fixed = TRUE) {
dfname <- deparse(substitute(mydf))
if (!isTRUE(fixed)) {
mydf <- data.frame(lapply(mydf, function(x) gsub(NAStrings, "", x)))
NAStrings <- ""
}
mydf <- data.frame(lapply(mydf, function(x) type.convert(
as.character(x), na.strings = NAStrings)))
mydf
}
Here it is in use:
makemeNA(df, "-")
# KY27PHY1 KY27PHY2 KY27PHY3
# 1 4 4 5
# 2 5 4 4
# 3 5 4 4
# 4 4 4 4
# 5 NA NA NA
# 6 4 5 5
# 7 2 2 1
# 8 3 3 4
# 9 5 5 5
# 10 NA NA NA
# 11 4 5 4
# 12 3 3 3
# 13 3 3 3
# 14 5 5 5
# 15 5 5 5
You can see from the str
ucture that we now have numeric output.
str(makemeNA(df, "-"))
# 'data.frame': 15 obs. of 3 variables:
# $ KY27PHY1: int 4 5 5 4 NA 4 2 3 5 NA ...
# $ KY27PHY2: int 4 4 4 4 NA 5 2 3 5 NA ...
# $ KY27PHY3: int 5 4 4 4 NA 5 1 4 5 NA ...
As with na.strings
, the NAStrings
in makemeNA
is plural . Here we make a dash and the values "1" into NA
.
str(makemeNA(df, c("-", 1)))
# 'data.frame': 15 obs. of 3 variables:
# $ KY27PHY1: int 4 5 5 4 NA 4 2 3 5 NA ...
# $ KY27PHY2: int 4 4 4 4 NA 5 2 3 5 NA ...
# $ KY27PHY3: int 5 4 4 4 NA 5 NA 4 5 NA ...
You can also use regular expressions to set values as NA
, as below:
df1 <- data.frame(A = c(1, 2, "-", "not applicable", 5),
B = c("not available", 1, 2, 3, 4),
C = c("-", letters[1:4]))
Make any values with "not" or "-" into NA
:
makemeNA(df1, "not.*|-", fixed = FALSE)
# A B C
# 1 1 NA <NA>
# 2 2 1 a
# 3 NA 2 b
# 4 NA 3 c
# 5 5 4 d
str(makemeNA(df1, "not.*|-", fixed = FALSE))
# 'data.frame': 5 obs. of 3 variables:
# $ A: int 1 2 NA NA 5
# $ B: int NA 1 2 3 4
# $ C: Factor w/ 4 levels "a","b","c","d": NA 1 2 3 4
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.