R中地址的正则表达式

Question

I want to take the matching piece that starts with CALLE or CARRERA and ends in the last number found. 我要选择以CALLE或CARRERA开头并以找到的最后一个数字结尾的匹配项。 I can't figure out this regex: 我不知道这个正则表达式：

input1: CALLE 15 # 21-32 APARTAMENTO SEGUNDO PISO 输入1：CALLE 15＃21-32 APARTAMENTO SEGUNDO PISO

output1: CALLE 15 # 21-32 输出1：CALLE 15＃21-32

input 2: THIS STRING WON'T MATCH 输入2：此字符串不匹配

output 2: THIS STRING WON'T MATCH 输出2：此字符串不匹配

And then replace the vector with the matched substring. 然后用匹配的子字符串替换向量。 But if no substring is matched, then leave the original string as it was. 但是，如果没有子字符串匹配，则保留原始字符串。

This is what I've tried: 这是我尝试过的：

df$DirRes2 <- regmatches(df$DirRes2, regexpr("(CALLE.*\\d | CARRERA.*\\d | .*)", df$DirRes2))

Answer 1

We can do either with base R 我们可以使用base R

sub(".*((?i)(CALLE|CARRERA).*[0-9])[^0-9]+$", "\\1", str1, perl = TRUE)
#[1] "CALLE 15 # 21-32"        "THIS STRING WON'T MATCH" "Calle 25"

Or using str_extract 或使用str_extract

library(stringr)
v1 <- trimws(str_extract(str1, "(?i)(CALLE|CARRERA)\\s*[0-9]+\\s*#*\\s*[0-9-]*"))
ifelse(is.na(v1), str1, v1)
#[1] "CALLE 15 # 21-32"        "THIS STRING WON'T MATCH" "Calle 25"

Update 更新资料

Based on the new pattern provided by the OP in the comments, @Jota's modified version works 根据OP在评论中提供的新模式，@ Jota的修改版可以工作

sub(".*?((?:CALLE|CARRERA).*\\d).*$", "\\1", str2, perl = TRUE, ignore.case = TRUE) 
#[1] "CALLE 15 # 21-32"        "THIS STRING WON'T MATCH" "Calle 25"  
#[4] "CALLE 18 CARRERA 7"

data 数据

str1 <- c("CALLE 15 # 21-32 APARTAMENTO SEGUNDO PISO", 
                 "THIS STRING WON'T MATCH", "Calle 25 Something")

str2 <- c(str1,  "CALLE 18 CARRERA 7 CONDOMINIO BELLO")

R中地址的正则表达式

问题描述

1 个解决方案

解决方案1
1 2017-02-13 03:31:48

Update 更新资料

data 数据

R中地址的正则表达式

问题描述

1 个解决方案

解决方案1 1 2017-02-13 03:31:48

Update 更新资料

data 数据

解决方案1
1 2017-02-13 03:31:48