簡體   English   中英

如何從 R 中的每個句子中隨機選擇一個字母、2 個字母、3 個字母、...、最多字母的單詞?

[英]How do I choose a random letter, 2 letters, 3 letters, ..., words with the most letters from each sentence in R?

我正在嘗試隨機選擇一個字母,2 個字母,3 個字母,...,每個句子中字母最多的單詞。 然后將這些單詞與空格組合成一個短語。

new_data <- sample_n(data.frame(stringr::sentences), 30)
new_data

split_data <- data.frame(X = str_remove_all(new_data$stringr..sentences, "[.,]"))
split_data

split_data <- strsplit(split_data$X," ")
split_data

for(i in split_data){
   generated <- split_data %>%
   lapply(nchar)
}

它應該有一個 output 像這樣:

我隨機選擇的句子是

“回家的漫長旅程花了一年時間。”

“小王子成為了王位繼承人。”

生成的短語是

“一年的旅程”

“繼承人的年輕人成為”

當你提到一些事情時,你想要達到的最終結果並不完全清楚,但你的最終結果只顯示了幾個隨機詞。

這里有幾個例子,你可以根據單詞和字符來處理句子。

library(stringi)

x <- "The long journey home took a year."

words <- stri_extract_all_words(x)[[1]]
words
# [1] "The"     "long"    "journey" "home"    "took"    "a"       "year" 

all_letters <- unlist(strsplit(words, ""))
all_letters
# [1] "T" "h" "e" "l" "o" "n" "g" "j" "o" "u" "r" "n" "e" "y" "h" "o" "m" "e" "t" "o" "o" "k" "a" "y" "e" "a" "r"

letter_counts <- rle(sort(stri_trans_tolower(all_letters)))
letter_counts

# Run Length Encoding
#   lengths: int [1:14] 2 4 1 2 1 1 1 1 2 5 ...
#   values : chr [1:14] "a" "e" "g" "h" "j" "k" "l" "m" "n" "o" "r" "t" "u" "y"

l <- 3
sample(letter_counts$values, l)
# [1] "m" "h" "j"

# most occuring letter
letter_counts$values[which(letter_counts$lengths == max(letter_counts$lengths))]
# [1] "o"

n <- 4
paste(sample(words, n), collapse = " ")
# [1] "took a year journey" (or any other random combination of "n" words)

words[which(nchar(words) == max(nchar(words)))] # longest word
# [1] "journey"

那是你要找的嗎

new_data <- dplyr::sample_n(data.frame(stringr::sentences), 30)
new_data

split_data <- data.frame(X = stringr::str_remove_all(new_data$stringr..sentences, "[.,]"))

最大長度 <- 10

split_data$X |> stringr::str_split("[:space:]") |> purrr::map( (words) { words_sorted <- words[order(nchar(words))]

  tibble::tibble(
    word = words_sorted,
    word_length = nchar(words_sorted)
  ) |> 
    dplyr::filter(word_length <= max_len) |>
    dplyr::group_by(word_length) |>
    dplyr::sample_n(1) |>
    dplyr::pull(word) |>
    paste0(collapse = " ")
}

)

它為您提供:

"He picked up the dice for a second roll."

每個長度一個隨機單詞:

"a up the dice picked"

如果你想控制最大長度的詞,你可以改變max_len

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM