[英]gsub R extracting string
I am trying to extract a string between two commas with gsub. 我试图用gsub提取两个逗号之间的字符串。 If I have the following 如果我有以下内容
xz<- "1620 Honeylocust Drive, 60210 IL, USA"
and I want to extract everything between the two commas, ( 60120 IL
), is it possible to use gsub? 我想提取两个逗号( 60120 IL
)之间的所有内容,是否可以使用gsub?
I have tried 我努力了
gsub(".*,","",xz)
The result is USA. 结果是美国。 How can I do it? 我该怎么做?
We can match zero or more characters that are not a ,
( [^,]*
) followed by a ,
followed by zero or more space from the start ( ^
) of the string or |
我们可以匹配零个或多个字符不是,
( [^,]*
),后跟一个,
接着从开始的零个或更多的空间( ^
字符串或) |
a ,
followed by zero or more characters that are not a ,
( [^,]*
) at the end ( $
) of string and replace with blank ( ""
) 一个,
随后的零个或多个字符不属于,
( [^,]*
)在端部( $
)串,并用空白替换( ""
)
gsub("^[^,]*,\\s*|,[^,]*$", "", xz)
#[1] "60210 IL"
Or another option is using sub
and capture as a group 或另一个选择是使用sub
和捕获作为一个组
sub("^[^,]+,\\s+([^,]+).*", "\\1", xz)
#[1] "60210 IL"
Or another option is regexpr/regmatches
另一个选择是regexpr/regmatches
regmatches(xz, regexpr("(?<=,\\s)[^,]*(?=,)", xz, perl = TRUE))
#[1] "60210 IL"
Or with str_extract
from stringr
或使用str_extract
的stringr
library(stringr)
str_extract(xz, "(?<=,\\s)[^,]*(?=,)")
#[1] "60210 IL"
With the new string, 使用新的字符串,
xz1 <- "1620, Honeylocust Drive, 60210 IL, USA"
sub(".*,\\s+(+[0-9]+[^,]+).*", "\\1", xz1)
#[1] "60210 IL"
You could also do this using strsplit and grep (here I did it in 2 lines for readability): 您也可以使用strsplit和grep进行此操作(为了便于阅读,我在两行中做了此操作):
xz1 <- "1620, Honeylocust Drive, 60210 IL, USA"
a1 <- strsplit(xz1, "[ ]*,[ ]*")[[1]]
grep("^[0-9]+[ ]+[A-Z]+", a1, value=TRUE)
#[1] "60210 IL"
It's not using gsub, and in the present case it is not better, but maybe it is easier to adapt to other situations. 它没有使用gsub,在当前情况下还不是更好,但是也许更容易适应其他情况。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.