简体   繁体   English

R中的Extract | Grep | Substring字符向量

[英]Extract|Grep|Substring character vector in R

String that start with ^passport only those entry need to be captured 以^ passport开头的字符串,仅那些条目需要被捕获

example : 例如:

entry = c("passport AR4133553 expires 11 mar 2019","passport 472420180","passport 563220533 (korea, north)",
          "passport iraq","passport m 788439","following data derived from an eritrean passport issued",
          "passport and national") 

desired output : Data has to capture only the passport and country name 所需的输出:数据必须仅捕获护照和国家/地区名称

**passport**  **passport_country**  
"AR4133553"   NA   
"472420180"   NA   
"563220533"   "korea, north"  
NA            "iraq"  
"788439"      NA  
NA            NA  
NA            NA  

Thanks in advance. 提前致谢。

Hope this helps! 希望这可以帮助!

#sample data
entry = c("passport AR4133553 expires 11 mar 2019",
          "passport 472420180",
          "passport 563220533 (korea, north)",
          "passport iraq",
          "passport m 788439",
          "following data derived from an eritrean passport issued",
          "passport and national") 

#fetch passport number from sample data (i.e. second string having numbers which is immediately after 'passport')
passport_no <- gsub("^passport\\s((([a-zA-Z]*\\d)|(\\d[a-zA-Z]*))\\S*).*", "\\1", entry, perl=T)
ind <- grep("^passport\\s((([a-zA-Z]*\\d)|(\\d[a-zA-Z]*))\\S*).*", entry, value=F)
passport_no[-ind] <- NA

#fetch passport country from sample data
library(maptools)
data(wrld_simpl)
passport_country <- lapply(gsub("[()]","",entry), function(x) 
  as.character(wrld_simpl@data$NAME[sapply(wrld_simpl@data$NAME, grepl, x, ignore.case=T)]))
passport_country <- lapply(passport_country, function(x) 
  if(identical(x, character(0))) NA_character_ else x)
#note that 'Korea, North' is not selected in above comparison as it's offical country name is 'Korea, Democratic People's Republic of'

#final data
df <- data.frame(cbind(passport_no, passport_country))
df

Output is: 输出为:

  passport_no passport_country
1   AR4133553               NA
2   472420180               NA
3   563220533               NA
4          NA             Iraq
5          NA               NA
6          NA          Eritrea
7          NA               NA

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM