[英]R split strings in vector with different lengths
I have a problem in R trying to split a vector of strings into a vector of vectors. 我在R中尝试将字符串的向量拆分为向量的向量时遇到问题。 If anyone can help me, please I am stuck.
如果有人可以帮助我,请坚持。
I have: 我有:
V <- c("AAAAA", "AAAAA BBBBB", "CCCCC DDDDD")
Using strsplit I get: 使用strsplit我得到:
s <- strplit(v)
s
[[1]]
[1] "AAAAA"
[[2]]
[1] "AAAAA" "BBBBB"
[[3]]
[1] "CCCCC" "DDDDD"
However I cannot access these to compare them. 但是,我无法访问它们以进行比较。 I would like something like:
我想要类似的东西:
s
[1] "AAAAA"
[2] "AAAAA" "BBBBB"
[3] "CCCCC" "DDDDD"
I would then like to see if the elements of each of these vectors are included in my validation vector (like c("AAAAA", "BBBBB, "CCCCC") and return a boolean at the end (TRUE if all elements are in, FALSE otherwise). For now my problem is getting those vectors. Any suggestion is welcome. 然后,我想看看这些向量中的每个向量的元素是否包含在我的验证向量中(例如c(“ AAAAA”,“ BBBBB,” CCCCC“),并在最后返回一个布尔值(如果所有元素都在其中,则为TRUE,否则为FALSE),目前我的问题是获取这些向量,欢迎提出任何建议。
strsplit returns a list you can go trough it by using lapply with a custom function strsplit返回一个列表,您可以通过对自定义函数使用lapply来浏览列表
V <- c("AAAAA", "AAAAA BBBBB", "CCCCC DDDDD")
s <- strsplit(V, split = " ")
val <- c("AAAAA", "BBBBB", "CCCCC")
lapply(s, function(x) x %in% val)
you can access list elements like this: 您可以像这样访问列表元素:
s[[1]]
s[[2]]
to check if all elements are present in val 检查是否所有元素都存在于val中
all <- lapply(s, function(x) sum(x %in% val) == length(val))
#output
[[1]]
[1] FALSE
[[2]]
[1] FALSE
[[3]]
[1] FALSE
to convert this list to a vector 将此列表转换为向量
all <- unlist(all)
to return the original elements from V 从V返回原始元素
v[all]
using tidyverse
, you could go with 使用
tidyverse
,您可以选择
V <- c("AAAAA", "AAAAA BBBBB", "CCCCC DDDDD")
validation <- c("AAAAA", "BBBBB", "CCCCC")
library(purrr)
library(stringr)
str_split(V, pattern = " ") %>%
map_lgl(~all(.x %in% validation))
#> [1] TRUE TRUE FALSE
You could also include this with dplyr
to obtain a clear summary of which vector is validated or not. 您还可以将其包含在
dplyr
以获取有关已验证或不验证哪个向量的清晰摘要。
library(dplyr, warn.conflicts=F)
data_frame(V) %>%
mutate(validate = str_split(V, pattern = " ") %>%
map_lgl(~all(.x %in% validation)))
#> # A tibble: 3 x 2
#> V validate
#> <chr> <lgl>
#> 1 AAAAA TRUE
#> 2 AAAAA BBBBB TRUE
#> 3 CCCCC DDDDD FALSE
R does not have a vector of vectors. R没有向量的向量。
To emulate this behavior you would usually use list
s and the apply
-family. 为了模拟这种行为,您通常会使用
list
和apply
-family。
input_vector <- c("AAAAA", "AAAAA BBBBB", "CCCCC DDDDD")
# split the string like you did
s <- strsplit(input_vector, split = " ")
s
#> [[1]]
#> [1] "AAAAA"
#>
#> [[2]]
#> [1] "AAAAA" "BBBBB"
#>
#> [[3]]
#> [1] "CCCCC" "DDDDD"
# create a vector with conditions that wee look for
validation_vector <- c("AAAAA", "BBBBB")
# create a matrix of matches
res_matrix <- sapply(s, function(s_part) {
validation_vector %in% s_part
})
# check if all validation_vector elements are true for a given input_vector-string
# by applying the 'all'-function over each column ("are all elements for a given column TRUE?")
res_vector <- apply(res_matrix, 2, all)
# for aesthetic purposes: add the name of the initial input_vector again
names(res_vector) <- input_vector
# display the result
res_vector
#> AAAAA AAAAA BBBBB CCCCC DDDDD
#> FALSE TRUE FALSE
You can have a look at the *apply
family of functions. 您可以看一下
*apply
函数系列。 For example, using sapply
to apply the strsplit
function to each of your list elements you get 例如,使用
sapply
将strsplit
函数应用于您获得的每个列表元素
vs <- sapply(V, strsplit, split = " ")
vs
$AAAAA
[1] "AAAAA"
$`AAAAA BBBBB`
[1] "AAAAA" "BBBBB"
$`CCCCC DDDDD`
[1] "CCCCC" "DDDDD"
Further to check against you validation
vector you can do 进一步检查您的
validation
向量,您可以执行
validation <- c("AAAAA", "BBBBB", "CCCCC")
vs_in_val <- sapply(vs, `%in%`, validation)
vs_in_val
$AAAAA
[1] TRUE
$`AAAAA BBBBB`
[1] TRUE TRUE
$`CCCCC DDDDD`
[1] TRUE FALSE
strsplit can help you do it if you combine it with 'lapply'. 如果将其与“ lapply”结合使用,strsplit可以帮助您完成此任务。
V <- c("AAAAA", "AAAAA BBBBB", "CCCCC DDDDD")
s <- strsplit(V," ")
sapply(s,function(x) return (sum(x %in% c("AAAAA", "BBBBB", "CCCCC"))/length(x)))
[1] 1.0 1.0 0.5
If the result returns 0,then it indicates that there is none of elements in your validation vectors. 如果结果返回0,则表明您的验证向量中没有元素。
If 1, all of elements in your validation vector. 如果为1,则验证向量中的所有元素。
if between 0 and 1,there is some of elements in your validation vector. 如果介于0和1之间,则验证向量中包含某些元素。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.