簡體   English   中英

r-多重匹配中的部分匹配

[英]partial matching in r- multiple matches

我正在利用下面的代碼與1場比賽進行部分匹配,但有一個跟進問題:假設我們有一個額外的魚類標准,我們希望“狗魚”被分類為魚和犬。 這可能嗎?

d<-data.frame(name=c("brown cat", "blue cat", "big lion", "tall tiger", 
                 "black panther", "short cat", "red bird",
                 "short bird stuffed", "big eagle", "bad sparrow",
                 "dog fish", "head dog", "brown yorkie",
                 "lab short bulldog"), label=1:14)

在代碼的開頭定義正則表達式

regexes <- list(c("(cat|lion|tiger|panther)","feline"),
            c("(bird|eagle|sparrow)","avian"),
            c("(dog|yorkie|bulldog)","canine"))

創建一個與df長度相同的向量

output_vector <- character(nrow(d))

對於每個正則表達式..

for(i in seq_along(regexes)){

#Grep through d$name, and when you find matches, insert the relevant 'tag' into
#The output vector
output_vector[grepl(x = d$name, pattern = regexes[[i]][1])] <- regexes[[i]][2]} 

將現在填充的輸出向量插入數據幀

d$species <- output_vector

期望的輸出

#                 name label species
#1           brown cat     1  feline
#2            blue cat     2  feline
#3            big lion     3  feline
#4          tall tiger     4  feline
#5       black panther     5  feline
#6           short cat     6  feline
#7            red bird     7   avian
#8  short bird stuffed     8   avian
#9           big eagle     9   avian
#10        bad sparrow    10   avian
#11           dog fish    11  canine, fish
#12           head dog    12  canine
#13       brown yorkie    13  canine
#14  lab short bulldog    14  canine

原始堆棧溢出問題在這里: 部分字符串匹配r

我會通過交叉加入。

library(dplyr)
library(stringi)

key = data_frame(partial = c("cat", "lion", "tiger", "panther",
                             "bird", "eagle", "sparrow",
                             "dog", "yorkie", "bulldog"),
                  category = c("feline", "feline", "feline", "feline",
                               "avian", "avian", "avian",
                               "canine", "canine", "canine"))

d %>%
  merge(key) %>%
  filter(name %>% stri_detect_fixed(partial) )

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM