I'm trying to pick a random letter, 2 letters, 3 letters, ..., words with the most letters from each sentence. Then combine these words with a space as a phrase.
new_data <- sample_n(data.frame(stringr::sentences), 30)
new_data
split_data <- data.frame(X = str_remove_all(new_data$stringr..sentences, "[.,]"))
split_data
split_data <- strsplit(split_data$X," ")
split_data
for(i in split_data){
generated <- split_data %>%
lapply(nchar)
}
It should have an output like this:
The sentences I randomly selected are
"The long journey home took a year."
"The young prince became heir to the throne."
…
The generated phrases are
“a The year journey”
“to the heir young became”
…
It is not completely clear to what you want to achieve as a final result as you mention a few things, but your final result only shows a few randomized words.
Here a few examples on what you can do with the sentence based on words and on characters..
library(stringi)
x <- "The long journey home took a year."
words <- stri_extract_all_words(x)[[1]]
words
# [1] "The" "long" "journey" "home" "took" "a" "year"
all_letters <- unlist(strsplit(words, ""))
all_letters
# [1] "T" "h" "e" "l" "o" "n" "g" "j" "o" "u" "r" "n" "e" "y" "h" "o" "m" "e" "t" "o" "o" "k" "a" "y" "e" "a" "r"
letter_counts <- rle(sort(stri_trans_tolower(all_letters)))
letter_counts
# Run Length Encoding
# lengths: int [1:14] 2 4 1 2 1 1 1 1 2 5 ...
# values : chr [1:14] "a" "e" "g" "h" "j" "k" "l" "m" "n" "o" "r" "t" "u" "y"
l <- 3
sample(letter_counts$values, l)
# [1] "m" "h" "j"
# most occuring letter
letter_counts$values[which(letter_counts$lengths == max(letter_counts$lengths))]
# [1] "o"
n <- 4
paste(sample(words, n), collapse = " ")
# [1] "took a year journey" (or any other random combination of "n" words)
words[which(nchar(words) == max(nchar(words)))] # longest word
# [1] "journey"
Is that what you are looking for
new_data <- dplyr::sample_n(data.frame(stringr::sentences), 30)
new_data
split_data <- data.frame(X = stringr::str_remove_all(new_data$stringr..sentences, "[.,]"))
max_len <- 10
split_data$X |> stringr::str_split("[:space:]") |> purrr::map( (words) { words_sorted <- words[order(nchar(words))]
tibble::tibble(
word = words_sorted,
word_length = nchar(words_sorted)
) |>
dplyr::filter(word_length <= max_len) |>
dplyr::group_by(word_length) |>
dplyr::sample_n(1) |>
dplyr::pull(word) |>
paste0(collapse = " ")
}
)
it gives you for:
"He picked up the dice for a second roll."
one random word per length:
"a up the dice picked"
if you want to steer the max length word you can change the max_len
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.