簡體   English   中英

R中的Extract | Grep | Substring字符向量

[英]Extract|Grep|Substring character vector in R

以^ passport開頭的字符串,僅那些條目需要被捕獲

例如:

entry = c("passport AR4133553 expires 11 mar 2019","passport 472420180","passport 563220533 (korea, north)",
          "passport iraq","passport m 788439","following data derived from an eritrean passport issued",
          "passport and national") 

所需的輸出:數據必須僅捕獲護照和國家/地區名稱

**passport**  **passport_country**  
"AR4133553"   NA   
"472420180"   NA   
"563220533"   "korea, north"  
NA            "iraq"  
"788439"      NA  
NA            NA  
NA            NA  

提前致謝。

希望這可以幫助!

#sample data
entry = c("passport AR4133553 expires 11 mar 2019",
          "passport 472420180",
          "passport 563220533 (korea, north)",
          "passport iraq",
          "passport m 788439",
          "following data derived from an eritrean passport issued",
          "passport and national") 

#fetch passport number from sample data (i.e. second string having numbers which is immediately after 'passport')
passport_no <- gsub("^passport\\s((([a-zA-Z]*\\d)|(\\d[a-zA-Z]*))\\S*).*", "\\1", entry, perl=T)
ind <- grep("^passport\\s((([a-zA-Z]*\\d)|(\\d[a-zA-Z]*))\\S*).*", entry, value=F)
passport_no[-ind] <- NA

#fetch passport country from sample data
library(maptools)
data(wrld_simpl)
passport_country <- lapply(gsub("[()]","",entry), function(x) 
  as.character(wrld_simpl@data$NAME[sapply(wrld_simpl@data$NAME, grepl, x, ignore.case=T)]))
passport_country <- lapply(passport_country, function(x) 
  if(identical(x, character(0))) NA_character_ else x)
#note that 'Korea, North' is not selected in above comparison as it's offical country name is 'Korea, Democratic People's Republic of'

#final data
df <- data.frame(cbind(passport_no, passport_country))
df

輸出為:

  passport_no passport_country
1   AR4133553               NA
2   472420180               NA
3   563220533               NA
4          NA             Iraq
5          NA               NA
6          NA          Eritrea
7          NA               NA

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM