简体   繁体   English

R中地址的正则表达式

[英]Regular expression for address in R

I want to take the matching piece that starts with CALLE or CARRERA and ends in the last number found. 我要选择以CALLE或CARRERA开头并以找到的最后一个数字结尾的匹配项。 I can't figure out this regex: 我不知道这个正则表达式:

input1: CALLE 15 # 21-32 APARTAMENTO SEGUNDO PISO 输入1:CALLE 15#21-32 APARTAMENTO SEGUNDO PISO

output1: CALLE 15 # 21-32 输出1:CALLE 15#21-32

input 2: THIS STRING WON'T MATCH 输入2:此字符串不匹配

output 2: THIS STRING WON'T MATCH 输出2:此字符串不匹配

And then replace the vector with the matched substring. 然后用匹配的子字符串替换向量。 But if no substring is matched, then leave the original string as it was. 但是,如果没有子字符串匹配,则保留原始字符串。

This is what I've tried: 这是我尝试过的:

df$DirRes2 <- regmatches(df$DirRes2, regexpr("(CALLE.*\\d | CARRERA.*\\d | .*)", df$DirRes2))

We can do either with base R 我们可以使用base R

sub(".*((?i)(CALLE|CARRERA).*[0-9])[^0-9]+$", "\\1", str1, perl = TRUE)
#[1] "CALLE 15 # 21-32"        "THIS STRING WON'T MATCH" "Calle 25"            

Or using str_extract 或使用str_extract

library(stringr)
v1 <- trimws(str_extract(str1, "(?i)(CALLE|CARRERA)\\s*[0-9]+\\s*#*\\s*[0-9-]*"))
ifelse(is.na(v1), str1, v1)
#[1] "CALLE 15 # 21-32"        "THIS STRING WON'T MATCH" "Calle 25"     

Update 更新资料

Based on the new pattern provided by the OP in the comments, @Jota's modified version works 根据OP在评论中提供的新模式,@ Jota的修改版可以工作

sub(".*?((?:CALLE|CARRERA).*\\d).*$", "\\1", str2, perl = TRUE, ignore.case = TRUE) 
#[1] "CALLE 15 # 21-32"        "THIS STRING WON'T MATCH" "Calle 25"  
#[4] "CALLE 18 CARRERA 7" 

data 数据

str1 <- c("CALLE 15 # 21-32 APARTAMENTO SEGUNDO PISO", 
                 "THIS STRING WON'T MATCH", "Calle 25 Something")

str2 <- c(str1,  "CALLE 18 CARRERA 7 CONDOMINIO BELLO")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM