简体   繁体   English

向量列表中的grepl和子集?

[英]grepl and subset from a list of vectors?

I have a list generated from the the code below,我有一个从下面的代码生成的列表,

df <- USArrests
df <- na.omit(df)
df <- scale(df)
d <- dist(df, method = "euclidean")

# Hierarchical clustering using Complete Linkage
hc1 <- hclust(d, method = "complete" )

library(dendextend)
dend15 <- d %>% hclust(method = "average") %>% as.dendrogram
dend15 %>% plot

subtrees <- partition_leaves(dend15)

What I would like to do is to subset a new list with grep for the keyword "Maine" .我想做的是用grep为关键字"Maine"子集一个新列表。 Is this possible?这可能吗?

Sample of the data:数据样本:

head ( subtrees, 20 )

[[1]]
 [1] "North Dakota"   "Maine"          "Iowa"           "New Hampshire"  "Vermont"       
 [6] "South Dakota"   "West Virginia"  "Delaware"       "Rhode Island"   "Massachusetts" 
[11] "New Jersey"     "Arkansas"       "Kentucky"       "Connecticut"    "Minnesota"     
[16] "Wisconsin"      "Idaho"          "Montana"        "Nebraska"       "Wyoming"       
[21] "Virginia"       "Oklahoma"       "Indiana"        "Kansas"         "Ohio"          
[26] "Pennsylvania"   "Hawaii"         "Utah"           "Oregon"         "Washington"    
[31] "Alaska"         "Georgia"        "Tennessee"      "Alabama"        "Louisiana"     
[36] "North Carolina" "Mississippi"    "South Carolina" "California"     "Nevada"        
[41] "Florida"        "Colorado"       "Missouri"       "Texas"          "Illinois"      
[46] "New York"       "Arizona"        "Michigan"       "Maryland"       "New Mexico"    

[[2]]
 [1] "North Dakota"  "Maine"         "Iowa"          "New Hampshire" "Vermont"      
 [6] "South Dakota"  "West Virginia" "Delaware"      "Rhode Island"  "Massachusetts"
[11] "New Jersey"    "Arkansas"      "Kentucky"      "Connecticut"   "Minnesota"    
[16] "Wisconsin"     "Idaho"         "Montana"       "Nebraska"      "Wyoming"      
[21] "Virginia"      "Oklahoma"      "Indiana"       "Kansas"        "Ohio"         
[26] "Pennsylvania"  "Hawaii"        "Utah"          "Oregon"        "Washington"   

[[3]]
[1] "North Dakota"  "Maine"         "Iowa"          "New Hampshire" "Vermont"      
[6] "South Dakota"  "West Virginia"

[[4]]
[1] "North Dakota"  "Maine"         "Iowa"          "New Hampshire"

[[5]]
[1] "North Dakota"

[[6]]
[1] "Maine"         "Iowa"          "New Hampshire"

[[7]]
[1] "Maine"

[[8]]
[1] "Iowa"          "New Hampshire"

[[9]]
[1] "Iowa"

lapply over the list and use grep grep列表并使用lapply

lapply(subtrees, grep, pattern = "Maine", value = TRUE)

You might want to remove empty lists from it which can be done using Filter您可能希望从中删除空列表,这可以使用Filter完成

Filter(function(x) length(x) > 0, lapply(subtrees, grep, pattern = "Maine", value = TRUE))

#[[1]]
#[1] "Maine"

#[[2]]
#[1] "Maine"

#[[3]]
#[1] "Maine"

#[[4]]
#[1] "Maine"

#[[5]]
#[1] "Maine"

#[[6]]
#[1] "Maine"

tidyverse way could be tidyverse方式可能是

purrr::map(subtrees, ~stringr::str_subset(.x, "Maine"))

To get the index of the list which matches we can use grepl along with which要获取匹配的列表的索引,我们可以使用grepl以及which

which(sapply(subtrees, function(x) any(grepl("Maine", x))))
#[1] 1 2 3 4 6 7

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM