简体   繁体   English

gsub R提取字符串

[英]gsub R extracting string

I am trying to extract a string between two commas with gsub. 我试图用gsub提取两个逗号之间的字符串。 If I have the following 如果我有以下内容

xz<- "1620 Honeylocust Drive, 60210 IL, USA"

and I want to extract everything between the two commas, ( 60120 IL ), is it possible to use gsub? 我想提取两个逗号( 60120 IL )之间的所有内容,是否可以使用gsub?

I have tried 我努力了

gsub(".*,","",xz)

The result is USA. 结果是美国。 How can I do it? 我该怎么做?

We can match zero or more characters that are not a , ( [^,]* ) followed by a , followed by zero or more space from the start ( ^ ) of the string or | 我们可以匹配零个或多个字符不是,[^,]* ),后跟一个,接着从开始的零个或更多的空间( ^字符串或) | a , followed by zero or more characters that are not a , ( [^,]* ) at the end ( $ ) of string and replace with blank ( "" ) 一个,随后的零个或多个字符不属于,[^,]* )在端部( $ )串,并用空白替换( ""

gsub("^[^,]*,\\s*|,[^,]*$", "", xz)
#[1] "60210 IL"

Or another option is using sub and capture as a group 或另一个选择是使用sub和捕获作为一个组

sub("^[^,]+,\\s+([^,]+).*", "\\1", xz)
#[1] "60210 IL"

Or another option is regexpr/regmatches 另一个选择是regexpr/regmatches

regmatches(xz, regexpr("(?<=,\\s)[^,]*(?=,)", xz, perl = TRUE))
#[1] "60210 IL"

Or with str_extract from stringr 或使用str_extractstringr

library(stringr)
str_extract(xz, "(?<=,\\s)[^,]*(?=,)")
#[1] "60210 IL"

Update 更新资料

With the new string, 使用新的字符串,

xz1 <- "1620, Honeylocust Drive, 60210 IL, USA"
sub(".*,\\s+(+[0-9]+[^,]+).*", "\\1", xz1)
#[1] "60210 IL"

You could also do this using strsplit and grep (here I did it in 2 lines for readability): 您也可以使用strsplit和grep进行此操作(为了便于阅读,我在两行中做了此操作):

xz1 <- "1620, Honeylocust Drive, 60210 IL, USA"
a1 <- strsplit(xz1, "[ ]*,[ ]*")[[1]]
grep("^[0-9]+[ ]+[A-Z]+", a1, value=TRUE)
#[1] "60210 IL"

It's not using gsub, and in the present case it is not better, but maybe it is easier to adapt to other situations. 它没有使用gsub,在当前情况下还不是更好,但是也许更容易适应其他情况。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM