简体   繁体   English

在R中提取一个子字符串

[英]Extract a substring in R

> ldata2[2]
    [1] "  \"pretty\": \"5:06 PM GMT on June 18, 2015\","
# Need to extract only the time information. In this case "5:06 PM GMT on June 18, 2015"
# My attempt
> time <- sub(".* :\"(.*)".*","\\1",ldata2[1])

This is the error message i get : Error: unexpected symbol in "time <- sub(".* :\\"(.*)"." Help appreciated 这是我得到的错误消息: Error: unexpected symbol in "time <- sub(".* :\\"(.*)"."

library(stringr)
str_match(x, ': \\"(.*)\\"')[2]
#[1] "5:06 PM GMT on June 18, 2015"

cat was used as reference in creating the regex pattern. cat被用作创建正则表达式模式的参考。

x <- "  \"pretty\": \"5:06 PM GMT on June 18, 2015\","
cat(x)
"pretty": "5:06 PM GMT on June 18, 2015",

The backslashes are gone. 反斜杠不见了。 I don't even reference them in my regex. 我什至没有在正则表达式中引用它们。 The pattern ': \\\\"(.*)\\\\"' starts with the colon, a space and one set of double quotes. 模式': \\\\"(.*)\\\\"'以冒号,空格和一组双引号开头。 The colon and space do not need special characters. 冒号和空格不需要特殊字符。 The double quotes have special regex meaning so the set is escaped with two backslashes. 双引号具有特殊的正则表达式含义,因此该集合使用两个反斜杠进行转义。 Next the capture group and another escaped double quote set. 接下来,捕获组和另一个转义的双引号集。

With sub: 带子:

sub('.*: \\"(.*)\\",', '\\1', x)
[1] "5:06 PM GMT on June 18, 2015"

Your pattern does not match the string so nothing is replaced. 您的模式与字符串不匹配,因此不会替换任何内容。 Here is the correct pattern: 这是正确的模式:

sub(".*: \"(.*)\".*","\\1",ldata[2])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM