![](/img/trans.png)
[英]How to run a loop in R to find a unique combination of numbers within a range of 7?
[英]Is there a way in R to find a combination of words (or sentences) within a certain range in a string
我正在尝试查找所有字符串,其中包含单词/句子的组合以及将它们分开但具有固定限制的其他单词。
示例:我想要“bought”和“watch”的组合,但最多用 2 个词分隔它们。
我在 R 上找不到任何接近我想要的东西。
要在字符串中查找简单的单词/句子,我使用str_extract_all
中的stringr
,如下所示:
my_analysis <- str_c("\\b(", str_c(my_list_of_words_and_sentences, collapse="|"), ")\\b")
df$words_and_sentences_found <- str_extract_all(df$my_strings, my_analysis)
一种思考方式:
my_list2 <- list("I bought a beautiful and shiny watch", "I bought a shiny watch",
"It was not bought but watch")
as_words <- unlist(str_split(my_list2, ' '))
t1 <- which(as_words == 'bought')
t2 <- which(as_words == 'watch')
t1
[1] 2 9 16
t2
[1] 7 12 18
t2-t1
[1] 5 3 2
您可以为此使用skip-grams :
library(tidyverse)
library(tidytext)
df <- tibble(id = 1:3,
txt = c("I bought a beautiful and shiny watch",
"I bought a shiny watch",
"The watch is very shiny"))
tidy_ngrams <- df %>%
## use k for the skip, and n for what degree of n-gram:
unnest_tokens(ngram, txt, token = "skip_ngrams", n_min = 2, n = 2, k = 2)
tidy_ngrams
#> # A tibble: 33 × 2
#> id ngram
#> <int> <chr>
#> 1 1 i bought
#> 2 1 i a
#> 3 1 i beautiful
#> 4 1 bought a
#> 5 1 bought beautiful
#> 6 1 bought and
#> 7 1 a beautiful
#> 8 1 a and
#> 9 1 a shiny
#> 10 1 beautiful and
#> # … with 23 more rows
tidy_ngrams %>%
filter(ngram == "bought watch")
#> # A tibble: 1 × 2
#> id ngram
#> <int> <chr>
#> 1 2 bought watch
由reprex package (v2.0.1) 创建于 2022-06-03
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.