[英]How do I take a random of sentence in R and count the number of characters per word and sorts the text according to those numbers from words
First, i want to select 5 or 6 sentences randomly, and after that, write a function that finds the letter numbers of all words in a given text and sorts the text according to those numbers from words with few letters to words with many letters.首先,我想 select 随机 5 或 6 个句子,然后写一个 function 找到给定文本中所有单词的字母编号,并根据这些数字从字母少的单词到字母多的单词对文本进行排序。 Sort the words containing the same number of letters alphabetically.按字母顺序对包含相同数量字母的单词进行排序。
[1] "We find joy in the simplest things. He wrote down a long list of items. The hail pattered on the burnt brown grass. Screen the porch with woven straw mats. The theft of the pearl pin was kept secret. Sweet words work better than fierce."
the function should return the result like this function 应该返回这样的结果
[1] "a he in of of on we joy pin the the the the the the was down find hail kept list long mats than with work brown burnt grass items pearl porch straw sweet theft words woven wrote better fierce screen secret things pattered simplest"
one approach with base R:一种以 R 为基数的方法:
sentence <-
"We find joy in the simplest things. He wrote down a long list of items.
The hail pattered on the burnt brown grass. Screen the porch with woven straw mats.
The theft of the pearl pin was kept secret. Sweet words work better than fierce."
sentence |>
strsplit('\\W+?') |> ## split at non-word characters
unlist() |>
(\(.) .[. != ""])() |> ## remove empty strings
(\(.) .[order(nchar(.), .)])() |> ## sort by string length and alphabet
paste(collapse = ' ')
output: output:
[1] "a He in of of on We joy pin the the the the The The was down find hail kept list long mats than with work brown burnt grass items pearl porch straw Sweet theft words woven wrote better fierce Screen secret things pattered simplest"
Note that there's some perhaps unfamiliar notation like (\(.)...)()
.请注意,有一些可能不熟悉的符号,例如(\(.)...)()
。 This is a shorthand for defining and executing an anonymous function:这是定义和执行匿名 function 的简写:
function(x){...}
can be written as \(x){...}
function(x){...}
可以写成\(x){...}
(\(x){...})()
defines and executes the function, where x
is the incoming value if you put this construct in a ... |>... |>
pipeline (\(x){...})()
定义并执行 function,如果将此构造放入... |>... |>
管道,其中x
是传入值A similar base R approach:类似的基础 R 方法:
str <- "We find joy in the simplest things. He wrote down a long list of items.
The hail pattered on the burnt brown grass. Screen the porch with woven
straw mats. The theft of the pearl pin was kept secret.
Sweet words work better than fierce."
words <- strsplit(str, "[[:punct:]]?\\s+[[:punct:]]?")[[1]]
split(words, nchar(words)) |>
lapply(sort) |>
unlist() |>
paste(collapse = " ")
#> [1] "a He in of of on We joy pin the the the the The The was down find hail
#> kept list long mats than with work brown burnt grass items pearl porch
#> straw Sweet theft words woven wrote better Screen secret things fierce.
#> pattered simplest"
library(tokenizers)
text = "We find joy in the simplest things. He wrote down a long list of items. The hail pattered on the burnt brown grass. Screen the porch with woven straw mats. The theft of the pearl pin was kept secret. Sweet words work better than fierce."
sort_count <- function(s){
words <- tokenize_words(text, simplify = T)
words[order(nchar(words), words)]
}
sort_count(text)
#> [1] "a" "he" "in" "of" "of" "on"
#> [7] "we" "joy" "pin" "the" "the" "the"
#> [13] "the" "the" "the" "was" "down" "find"
#> [19] "hail" "kept" "list" "long" "mats" "than"
#> [25] "with" "work" "brown" "burnt" "grass" "items"
#> [31] "pearl" "porch" "straw" "sweet" "theft" "words"
#> [37] "woven" "wrote" "better" "fierce" "screen" "secret"
#> [43] "things" "pattered" "simplest"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.