[英]Capturing and extracting a RegEX group in R
I have a set of data that looks as such:我有一组看起来像这样的数据:
text_string <- structure(list(text_string = c("A Nanny-Back Up Care and Staffing Company-San Diego, OC, LA, San Francisco, Portland, Las Vegas, Phoenix, Seattle, Denver and NY. @jefffoes",
"Creative Producer of @crwnmag LA-NY-TX dereksith@googke.com Founded @marcusharper",
"daily elements for life and style texas transplant in california LA lauren@gmail.com read my blog + shop my instagram",
"LIVE, LAUGH, LOVE")), class = "data.frame", row.names = c(NA,
-4L))
I am trying to capture each instance of "LA" in the string and create a new field with it.我正在尝试捕获字符串中“LA”的每个实例并用它创建一个新字段。 In the Regex code I used it should return a match of "LA" for the first three strings, while the last one returns no match.在我使用的 Regex 代码中,它应该为前三个字符串返回“LA”的匹配项,而最后一个不返回匹配项。 You can see the example here .您可以在此处查看示例。
I thought this code would do the trick, but it appears to not be the case:我认为这段代码可以解决问题,但事实并非如此:
text_string_new <- text_string %>% mutate(new_field = str_replace(string = text_string,
pattern = "(LA)(\\b|,)",
replacement = "\\1"))
All that seems to do is return an exact copy of the text_string
field.似乎所做的只是返回text_string
字段的精确副本。
Using str_extract
rather than str_replace
seems to do the trick.使用str_extract
而不是str_replace
似乎可以解决问题。
text_string %>% mutate(new_field = str_extract(string = text_string,
pattern = "(LA)(\\b|,)"))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.