gsub R extracting string

Question

I am trying to extract a string between two commas with gsub. If I have the following

xz<- "1620 Honeylocust Drive, 60210 IL, USA"

and I want to extract everything between the two commas, ( 60120 IL ), is it possible to use gsub?

I have tried

gsub(".*,","",xz)

The result is USA. How can I do it?

Answer 1

We can match zero or more characters that are not a , ( [^,]* ) followed by a , followed by zero or more space from the start ( ^ ) of the string or | a , followed by zero or more characters that are not a , ( [^,]* ) at the end ( $ ) of string and replace with blank ( "" )

gsub("^[^,]*,\\s*|,[^,]*$", "", xz)
#[1] "60210 IL"

Or another option is using sub and capture as a group

sub("^[^,]+,\\s+([^,]+).*", "\\1", xz)
#[1] "60210 IL"

Or another option is regexpr/regmatches

regmatches(xz, regexpr("(?<=,\\s)[^,]*(?=,)", xz, perl = TRUE))
#[1] "60210 IL"

Or with str_extract from stringr

library(stringr)
str_extract(xz, "(?<=,\\s)[^,]*(?=,)")
#[1] "60210 IL"

Update

With the new string,

xz1 <- "1620, Honeylocust Drive, 60210 IL, USA"
sub(".*,\\s+(+[0-9]+[^,]+).*", "\\1", xz1)
#[1] "60210 IL"

Answer 2

You could also do this using strsplit and grep (here I did it in 2 lines for readability):

xz1 <- "1620, Honeylocust Drive, 60210 IL, USA"
a1 <- strsplit(xz1, "[ ]*,[ ]*")[[1]]
grep("^[0-9]+[ ]+[A-Z]+", a1, value=TRUE)
#[1] "60210 IL"

It's not using gsub, and in the present case it is not better, but maybe it is easier to adapt to other situations.

gsub R extracting string

Question

2 answers

solution1
3 2017-06-08 08:10:15

Update

solution2
1 2017-06-08 09:36:48

gsub R extracting string

Question

2 answers

solution1 3 2017-06-08 08:10:15

Update

solution2 1 2017-06-08 09:36:48

solution1
3 2017-06-08 08:10:15

solution2
1 2017-06-08 09:36:48