Running into issues converting a data frame into R.
I have a bunch of columns that were read as factors
and have %
symbols with them.
I know that for a single column I could do:
df[,3] <- as.numeric(sub("%","",df[,3]))
But trying to apply this to the whole dataset does not seem to work and changes all the values to NA. What am I doing wrong? Here is the code I tried to use:
df[,-1] <- as.numeric(sub("%","",df[,-1]))
EDIT: I know I can solve this with:
for (i in 2:66) {
df[,i] <- as.numeric(sub("%","",df[,i]))
print(class(df[,i]))
}
But there has to be a more elegant (and hopefully one-line) way to do this.
EDIT 2: Here is some of the data:
Year v1 v2 v3 v4
1 12-Oct 0% 0% 39% 14%
2 12-Nov 0% 6% 59% 4%
3 12-Dec 22% 0% 37% 26%
4 13-Jan 45% 0% 66% 19%
5 13-Feb 28% 39% 74% 13%
ANSWERED: Here is how I did it in one command after you all helped me so much! I was having problems with specifying the function part.
df=read.csv("all response rates.csv")
df[-1]<-data.frame(apply(df[-1], 2, function(x)
as.numeric(sub("%","",as.character(x)))))
parse_number
from the readr
package will remove the %
symbols. For your given data set, try:
library(dplyr)
library(readr)
res <- cbind(df %>% select(Year), # preserve the year column as-is
df %>% select(-Year) %>% mutate_all(funs(parse_number))
)
> res
Year v1 v2 v3 v4
1 12-Oct 0 0 39 14
2 12-Nov 0 6 59 4
3 12-Dec 22 0 37 26
4 13-Jan 45 0 66 19
5 13-Feb 28 39 74 13
If you don't need to preserve your first column, you only need the excerpt:
df %>% select(-Year) %>% mutate_all(funs(parse_number))
Here is an option using set
from data.table
, which would be faster for big datasets as the overhead of [.data.table
is avoided
library(stringi)
library(data.table)
setDT(df)
for(j in 2:ncol(df)){
set(df, i=NULL, j=j, value= as.numeric(stri_extract(df[[j]], regex='\\d+')))
}
df
# Year v1 v2 v3 v4
#1: 12-Oct 0 0 39 14
#2: 12-Nov 0 6 59 4
#3: 12-Dec 22 0 37 26
#4: 13-Jan 45 0 66 19
#5: 13-Feb 28 39 74 13
Try this approach using functions from base
:
# dummy data:
df<-data.frame(v1=c("78%", "65%", "32%"), v2=c("43%", "56%", "23%"))
# function
df2<-data.frame(lapply(df, function(x) as.numeric(sub("%", "", x))) )
As per the comments provided this first strips away the percentage signs, and then converts the columns from factors to numeric. I've changed the original answer from apply
to lapply
following @thelatemail's suggestions.
Here is a one line solution that assumes the data is in fixed width columns. I needed to remove the first row of names since all the columns did not have names. The widths of columns are specified as integers (with negative meaning to skip that many characters.) It also changes the column classes to numeric during the read.
your data
1 12-Oct 0% 0% 39% 14%
2 12-Nov 0% 6% 59% 4%
3 12-Dec 22% 0% 37% 26%
4 13-Jan 45% 0% 66% 19%
5 13-Feb 28% 39% 74% 13%
the R one-line script
adf <- read.fwf(file="a.dat",widths=c(-8,9,-1,7,-1,8,-1,8),colClasses=rep("numeric",4))
output result (first col provided by R to count the rows)
V1 V2 V3 V4
1 0 0 39 14
2 0 6 59 4
3 22 0 37 26
4 45 0 66 19
5 28 39 74 13
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.