gsub R提取字符串

Question

I am trying to extract a string between two commas with gsub. 我试图用gsub提取两个逗号之间的字符串。 If I have the following 如果我有以下内容

xz<- "1620 Honeylocust Drive, 60210 IL, USA"

and I want to extract everything between the two commas, ( 60120 IL ), is it possible to use gsub? 我想提取两个逗号（ 60120 IL ）之间的所有内容，是否可以使用gsub？

I have tried 我努力了

gsub(".*,","",xz)

The result is USA. 结果是美国。 How can I do it? 我该怎么做？

Answer 1

We can match zero or more characters that are not a , ( [^,]* ) followed by a , followed by zero or more space from the start ( ^ ) of the string or | 我们可以匹配零个或多个字符不是, （ [^,]* ），后跟一个,接着从开始的零个或更多的空间（ ^字符串或） | a , followed by zero or more characters that are not a , ( [^,]* ) at the end ( $ ) of string and replace with blank ( "" ) 一个,随后的零个或多个字符不属于, （ [^,]* ）在端部（ $ ）串，并用空白替换（ "" ）

gsub("^[^,]*,\\s*|,[^,]*$", "", xz)
#[1] "60210 IL"

Or another option is using sub and capture as a group 或另一个选择是使用sub和捕获作为一个组

sub("^[^,]+,\\s+([^,]+).*", "\\1", xz)
#[1] "60210 IL"

Or another option is regexpr/regmatches 另一个选择是regexpr/regmatches

regmatches(xz, regexpr("(?<=,\\s)[^,]*(?=,)", xz, perl = TRUE))
#[1] "60210 IL"

Or with str_extract from stringr 或使用str_extract的stringr

library(stringr)
str_extract(xz, "(?<=,\\s)[^,]*(?=,)")
#[1] "60210 IL"

Update 更新资料

With the new string, 使用新的字符串，

xz1 <- "1620, Honeylocust Drive, 60210 IL, USA"
sub(".*,\\s+(+[0-9]+[^,]+).*", "\\1", xz1)
#[1] "60210 IL"

Answer 2

You could also do this using strsplit and grep (here I did it in 2 lines for readability): 您也可以使用strsplit和grep进行此操作（为了便于阅读，我在两行中做了此操作）：

xz1 <- "1620, Honeylocust Drive, 60210 IL, USA"
a1 <- strsplit(xz1, "[ ]*,[ ]*")[[1]]
grep("^[0-9]+[ ]+[A-Z]+", a1, value=TRUE)
#[1] "60210 IL"

It's not using gsub, and in the present case it is not better, but maybe it is easier to adapt to other situations. 它没有使用gsub，在当前情况下还不是更好，但是也许更容易适应其他情况。

gsub R提取字符串

问题描述

2 个解决方案

解决方案1
3 2017-06-08 08:10:15

Update 更新资料

解决方案2
1 2017-06-08 09:36:48

gsub R提取字符串

问题描述

2 个解决方案

解决方案1 3 2017-06-08 08:10:15

Update 更新资料

解决方案2 1 2017-06-08 09:36:48

解决方案1
3 2017-06-08 08:10:15

解决方案2
1 2017-06-08 09:36:48