简体   繁体   English

在R中的数据框中搜索特定字符集

[英]Searching for a specific character set in a data frame in R

I have created a set of characters with some missing values like this 我创建了一组字符,但缺少一些这样的值

bp <- rep(NA, 5)
bp[c(2,4)] <- c("sugar","milk")
bp

> bp
[1] NA  "sugar" NA  "milk" NA 

I'm looking for a way to use bp for searching a larger data frame in order to find similar occurances of bp (and where), but with NAs filled. 我正在寻找一种使用bp搜索更大的数据帧的方法,以便找到类似的bp出现 (以及在何处),但填充了NA。

For example, 例如,

[1] any1  "sugar" any2  "milk" any3 
[2] any2  "sugar" any5  "milk" any1 
[3] any6  "sugar" any1  "milk" any3 
[4] any8  "sugar" any7  "milk" any6
[5] any1  "sugar" any2  "milk" any3 

EDIT: A part of the data frame looks something like this 编辑:数据框架的一部分看起来像这样

c("milk", "sugar", "sugar", "creme", "carw", "milk", "creme", "carw", 
"sugar", "carw", "creme", "sugar", "sugar", "milk", "milk", "creme", 
"sugar", "sugar", "carw", "carw", "carw", "milk", "sugar", "sugar", 
"carw", "sugar", "milk", "sugar", "creme", "carw", "carw", "carw", 
"creme", "carw", "carw", "creme", "creme", "milk", "carw", "milk", 
"milk", "creme", "creme", "creme", "milk", "milk", "creme", "carw", 
"carw", "milk", "milk", "creme", "creme", "carw", "carw", "milk", 
"sugar", "carw", "milk", "carw", "creme", "sugar", "sugar", "creme", 
"sugar", "sugar", "creme", "sugar", "carw", "sugar", "carw", 
"carw", "creme", "sugar", "milk", "milk", "carw", "carw", "milk", 
"creme", "sugar", "carw", "milk", "sugar", "sugar", "milk", "sugar", 
"creme", "milk", "milk", "carw", "milk", "sugar", "carw", "sugar", 
"carw", "creme", "creme", "carw", "milk", "milk", "milk", "milk", 
"carw", "carw", "milk", "milk", "carw", "sugar", "milk", "milk", 
"milk", "creme", "carw", "creme", "milk", "milk", "milk", "creme", 
"carw", "milk", "carw", "carw", "carw", "carw", "carw", "carw"
)

I would normaly use this for searching the entire data frame, but in this situation it's tricky. 我通常会用它来搜索整个数据帧,但是在这种情况下会很棘手。

library(data.table)

n1 <- length(bp)
bp.pos <- setDT(data.frame)[,  which(Reduce(`&`, Map(`==`, shift(value1, seq(n1)-1, 
                                                                             type = "lead"), 
                                                                 bp)))]

Any help would be appreciated. 任何帮助,将不胜感激。

Here's an attempt based I what I understand of your problem. 这是基于我对您的问题了解的尝试。 I call the vector you shared x : 我称您共享的向量x

test = sapply(seq_along(bp), function(i) bp[i] == x[(0 + i):(length(x) - length(bp) + i)])
test = test | is.na(test)
res = which(apply(test, 1, all))
res = lapply(res, function(x) x + seq_along(bp) - 1)
final = lapply(res, function(z) x[z])
names(final) = lapply(res, "[", 1)

# $`11`
# [1] "creme" "sugar" "sugar" "milk"  "milk" 
# 
# $`12`
# [1] "sugar" "sugar" "milk"  "milk"  "creme"
# 
# $`56`
# [1] "milk"  "sugar" "carw"  "milk"  "carw" 
# 
# $`73`
# [1] "creme" "sugar" "milk"  "milk"  "carw" 
# 
# $`80`
# [1] "creme" "sugar" "carw"  "milk"  "sugar"
# 
# $`83`
# [1] "milk"  "sugar" "sugar" "milk"  "sugar"
# 
# $`86`
# [1] "milk"  "sugar" "creme" "milk"  "milk" 
# 
# $`108`
# [1] "carw"  "sugar" "milk"  "milk"  "milk" 

The result is a named list where the name is the starting index of x and the value is the matched vector. 结果是一个命名列表,其中名称是x的起始索引,值是匹配的向量。 This gives you both the "where" as well as the match in one object. 这使您既可以“匹配”一个对象,也可以匹配一个对象。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM