Equivalent of this string replacement code in R?

Question

I have the following code that works really well to remove characters from the end of elements in a Python list:

x = ['01/01/2013 00:00:00','01/01/2013 00:00:00',
    '01/01/2013 00:00:00','01/01/2013 00:00:00',...]

Assuming that array, I want to remove the 00:00:00 part. So, I wrote this:

i = 0
while i < len(x):
    x[i] = x[i][:x[i].find(' 00:00:00')]
    i += 1

This does the trick. How can I implement a similar solution in R? I've tried substr and gsub , but they run really slow (the actual list has over 250,000 date/time combos).

Answer 1

Try

x <- rep('01/01/2013 00:00:00', 250000)
system.time(y <- sub(" 00:00:00", "", x, fixed=TRUE))
# User      System verstrichen 
# 0.05        0.00        0.05

y contains the result. Timing shows that it should not take too long. See ?sub for help on the parameters.

Answer 2

Consider some sample data:

set.seed(144)
dat <- sample(c("01/01/2013 00:00:00", "01/01/2013 12:34:56"), 200000, replace=T)
table(dat)
# dat
# 01/01/2013 00:00:00 01/01/2013 12:34:56 
#              100100               99900

Here, we want to remove the trailing 00:00:00 but keep the trailing 12:34:56.

You could first find 00:00:00 at the end of the string with the following (runs in ~0.1 seconds on my computer):

to.clean <- grepl(" 00:00:00$", dat)

Now you can use substr to remove the relevant trailing characters (runs in ~0.04 seconds on my computer):

dat[to.clean] <- substr(dat[to.clean], 1, nchar(dat[to.clean])-9)
table(dat)
# dat
#          01/01/2013 01/01/2013 12:34:56 
#              100100               99900

Alternately, the following more compact gsub command also runs in about 0.15 seconds for these 200,000 date/time pairs:

cleaned <- gsub(" 00:00:00$", "", dat)
table(cleaned)
# cleaned
#          01/01/2013 01/01/2013 12:34:56 
#              100100               99900

It's possible that you were looping through the data and separately calling substr or gsub on each individual element of your vector, which would certainly be expected to be much slower since it doesn't take advantage of vectorization.

Equivalent of this string replacement code in R?

Question

2 answers

solution1
2 2016-01-29 22:22:51

solution2
2 2016-01-29 22:24:39

Equivalent of this string replacement code in R?

Question

2 answers

solution1 2 2016-01-29 22:22:51

solution2 2 2016-01-29 22:24:39

solution1
2 2016-01-29 22:22:51

solution2
2 2016-01-29 22:24:39