简体   繁体   中英

How do I choose a random letter, 2 letters, 3 letters, ..., words with the most letters from each sentence in R?

I'm trying to pick a random letter, 2 letters, 3 letters, ..., words with the most letters from each sentence. Then combine these words with a space as a phrase.

new_data <- sample_n(data.frame(stringr::sentences), 30)
new_data

split_data <- data.frame(X = str_remove_all(new_data$stringr..sentences, "[.,]"))
split_data

split_data <- strsplit(split_data$X," ")
split_data

for(i in split_data){
   generated <- split_data %>%
   lapply(nchar)
}

It should have an output like this:

The sentences I randomly selected are

"The long journey home took a year."

"The young prince became heir to the throne."

The generated phrases are

“a The year journey”

“to the heir young became”

It is not completely clear to what you want to achieve as a final result as you mention a few things, but your final result only shows a few randomized words.

Here a few examples on what you can do with the sentence based on words and on characters..

library(stringi)

x <- "The long journey home took a year."

words <- stri_extract_all_words(x)[[1]]
words
# [1] "The"     "long"    "journey" "home"    "took"    "a"       "year" 

all_letters <- unlist(strsplit(words, ""))
all_letters
# [1] "T" "h" "e" "l" "o" "n" "g" "j" "o" "u" "r" "n" "e" "y" "h" "o" "m" "e" "t" "o" "o" "k" "a" "y" "e" "a" "r"

letter_counts <- rle(sort(stri_trans_tolower(all_letters)))
letter_counts

# Run Length Encoding
#   lengths: int [1:14] 2 4 1 2 1 1 1 1 2 5 ...
#   values : chr [1:14] "a" "e" "g" "h" "j" "k" "l" "m" "n" "o" "r" "t" "u" "y"

l <- 3
sample(letter_counts$values, l)
# [1] "m" "h" "j"

# most occuring letter
letter_counts$values[which(letter_counts$lengths == max(letter_counts$lengths))]
# [1] "o"

n <- 4
paste(sample(words, n), collapse = " ")
# [1] "took a year journey" (or any other random combination of "n" words)

words[which(nchar(words) == max(nchar(words)))] # longest word
# [1] "journey"

Is that what you are looking for

new_data <- dplyr::sample_n(data.frame(stringr::sentences), 30)
new_data

split_data <- data.frame(X = stringr::str_remove_all(new_data$stringr..sentences, "[.,]"))

max_len <- 10

split_data$X |> stringr::str_split("[:space:]") |> purrr::map( (words) { words_sorted <- words[order(nchar(words))]

  tibble::tibble(
    word = words_sorted,
    word_length = nchar(words_sorted)
  ) |> 
    dplyr::filter(word_length <= max_len) |>
    dplyr::group_by(word_length) |>
    dplyr::sample_n(1) |>
    dplyr::pull(word) |>
    paste0(collapse = " ")
}

)

it gives you for:

"He picked up the dice for a second roll."

one random word per length:

"a up the dice picked"

if you want to steer the max length word you can change the max_len

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM