繁体   English   中英

正则表达式或R中文本的条件

[英]Regex or condition for text in r

我想说一条文字

1) "Project:ABC is located near CBA, being too far from city  "
2) "P r o j e c t : PQR is located near RQP, highlights some greenary"

我想提取单词“ project ”和“”之间的文本以便我的输出是text1的ABC is located near CBA PQR is located near RQP ”和text2的PQR is located near RQP ”,因为我使用了regex

x="Project:ABC is located near CBA, being too far from city  "
sub(".*Project: *(.*?) *, .*", "\\1", x)
O\P
ABC is located near CBA

但是对于text2),它没有提供正确的输出,因此如何包含OR条件,以使我的两个条件都得到满足。 任何建议都会有所帮助。 谢谢

您可以将某些正则表达式与Lookahead和Lookbehind断言一起使用。

在一个小例子中使用stringr

Vec <- c("Project:ABC is located near CBA, being too far from city", 
         "P r o j e c t : PQR is located near RQP, highlights some greenary")
library(stringr)
str_extract(Vec, "(?<=:).*(?=,)")
#> [1] "ABC is located near CBA"  " PQR is located near RQP"

如果您输入的内容比较复杂,则应调整正则表达式,因为它可能不够严格(当前,它介于first :和last , )。

使您的正则表达式更加灵活: [^:]+:\\s*([^,]+),.*

> sub("[^:]+:\\s*([^,]+),.*", "\\1", "P r o j e c t : PQR is located near RQP, highlights some greenary")
[1] "PQR is located near RQP"

> sub("[^:]+:\\s*([^,]+),.*", "\\1", "Project:ABC is located near CBA, being too far from city  ")
[1] "ABC is located near CBA"

base R一个选项是gsub来匹配字符( .* )直到:然后是零个或多个空格( \\\\s* )或( | )a ,然后是其他字符,然后将其替换为空白( ""

gsub(".*:\\s*|,.*", "", Vec)
#[1] "ABC is located near CBA" "PQR is located near RQP"

如果我们需要匹配Project然后匹配:

pat <- paste0(gsub("", "\\\\s*", "Project"), ":\\s*|\\s*,.*")
gsub(pat, "", Vec)
#[1] "ABC is located near CBA" "PQR is located near RQP" "Ganga gnd A3 And 3.."   

数据

Vec <- c("Project:ABC is located near CBA, being too far from city", 
 "P r o j e c t : PQR is located near RQP, highlights some greenary", 
 "Project: Ganga gnd A3 And 3.., Plot Bearing / CTS / Survey / Final Plot No.: Sr No"
 )

如果Project字不重要:

> text
[1] "Project:ABC is located near CBA, being too far from city  "
> substr(text,grep(":",strsplit(text,'')[[1]]),grep(",",strsplit(text,'')[[1]]))
[1] ":ABC is located near CBA,"
> substr(text,grep(":",strsplit(text,'')[[1]])+1,grep(",",strsplit(text,'')[[1]])-1)
[1] "ABC is located near CBA"
> text <- "P r o j e c t : PQR is located near RQP, highlights some greenary"
> substr(text,grep(":",strsplit(text,'')[[1]])+1,grep(",",strsplit(text,'')[[1]])-1)
[1] " PQR is located near RQP"

应该工作正常!

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM