简体   繁体   中英

Remove commas from character vectors based on specific column names in R

I have a large dataframe. A smaller subset is as follows:

structure(list(Date = c("2017-08-12", "2017-08-12", "2017-08-12"
  ), `Time (sec)` = c("19:01:04", "07:30:18", "04:29:38"), `4+DURATION` = c("26", 
  "58,000", "27"), `4+'000 (AVG)` = c("0.0000", "0.0000", "0.0000"), 
  `15+DURATION` = c("26", "57,000", "27"), `15+'000 (AVG)` = c("0.0000", 
  "0.0000", "0.0000")), .Names = c("Date", "Time (sec)", "4+DURATION", 
   "4+'000 (AVG)", "15+DURATION", "15+'000 (AVG)"), row.names = 3:5, class = "data.frame")

The actual data frame looks like this:

       Date Time (sec) 4+DURATION 4+'000 (AVG) 15+DURATION 15+'000 (AVG)
3 2017-08-12   19:01:04         26       0.0000          26        0.0000
4 2017-08-12   07:30:18     58,000       0.0000      57,000        0.0000
5 2017-08-12   04:29:38         27       0.0000          27        0.0000

In this from column 3 onwards the rest of the columns were stored as character vector. I am trying to convert the character to numeric. The following is the code that I used.

cols.num <- names(dat[,-c(1:2)])
dat[cols.num] <- sapply(dat[cols.num],as.numeric)

dat is my data frame. This coerces NA values in both the duration columns where the character value has a extra comma in it.

I tried to remove it by

df[,unique(grep("DUR", names(df), value=T))] <- gsub(",","",df[,unique(grep("DUR", names(df), value=T))])

But this creates a df like this

    Date Time (sec)           4+DURATION 4+'000 (AVG)          15+DURATION 15+'000 (AVG)
3 2017-08-12   19:01:04 c("26" "58000" "27")       0.0000 c("26" "57000" "27")        0.0000
4 2017-08-12   07:30:18 c("26" "57000" "27")       0.0000 c("26" "58000" "27")        0.0000
5 2017-08-12   04:29:38 c("26" "58000" "27")       0.0000 c("26" "57000" "27")        0.0000

But the desired output is:

   Date Time (sec) 4+DURATION 4+'000 (AVG) 15+DURATION 15+'000 (AVG)
3 2017-08-12   19:01:04         26       0.0000          26        0.0000
4 2017-08-12   07:30:18      58000       0.0000        57000        0.0000
5 2017-08-12   04:29:38         27       0.0000          27        0.0000

The problem in this data frame is, I don't know which column will have the duration value and the column name with duration value keeps changing, from 4+DURATION to 45+DURATION, etc. I want to remove the comma from all the vectors with DURATION in their names before sapplying the vector to numeric.

You need to *apply it to the columns of interest since gsub (FYI, sub will also do just fine here) is NOT vectorized, ie

df[,unique(grep("DUR", names(df), value=T))] <- 
                     lapply(df[,unique(grep("DUR", names(df), value=T))], function(i) 
                                                          as.numeric(sub(',', '', i)))

which gives,

 Date Time (sec) 4+DURATION 4+'000 (AVG) 15+DURATION 15+'000 (AVG) 3 2017-08-12 19:01:04 26 0.0000 26 0.0000 4 2017-08-12 07:30:18 58000 0.0000 57000 0.0000 5 2017-08-12 04:29:38 27 0.0000 27 0.0000
#str(df)
#'data.frame':  3 obs. of  6 variables:
# $ Date         : chr  "2017-08-12" "2017-08-12" "2017-08-12"
# $ Time (sec)   : chr  "19:01:04" "07:30:18" "04:29:38"
# $ 4+DURATION   : num  26 58000 27
# $ 4+'000 (AVG) : chr  "0.0000" "0.0000" "0.0000"
# $ 15+DURATION  : num  26 57000 27
# $ 15+'000 (AVG): chr  "0.0000" "0.0000" "0.0000"

A dplyr solution:

d <- structure(list(Date = c("2017-08-12", "2017-08-12", "2017-08-12"
  ), `Time (sec)` = c("19:01:04", "07:30:18", "04:29:38"), `4+DURATION` = c("26", 
  "58,000", "27"), `4+'000 (AVG)` = c("0.0000", "0.0000", "0.0000"), 
  `15+DURATION` = c("26", "57,000", "27"), `15+'000 (AVG)` = c("0.0000", 
  "0.0000", "0.0000")), .Names = c("Date", "Time (sec)", "4+DURATION", 
   "4+'000 (AVG)", "15+DURATION", "15+'000 (AVG)"), row.names = 3:5, class = "data.frame")
d2 <- d %>% mutate_at(vars(contains('DURATION')), funs(as.numeric(gsub(',', '', .))))
str(d2)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM