我如何在 R 中随机选择一个句子并计算每个单词的字符数并根据单词中的这些数字对文本进行排序

Question

First, i want to select 5 or 6 sentences randomly, and after that, write a function that finds the letter numbers of all words in a given text and sorts the text according to those numbers from words with few letters to words with many letters.首先，我想 select 随机 5 或 6 个句子，然后写一个 function 找到给定文本中所有单词的字母编号，并根据这些数字从字母少的单词到字母多的单词对文本进行排序。 Sort the words containing the same number of letters alphabetically.按字母顺序对包含相同数量字母的单词进行排序。

[1] "We find joy in the simplest things. He wrote down a long list of items. The hail pattered on the burnt brown grass. Screen the porch with woven straw mats. The theft of the pearl pin was kept secret. Sweet words work better than fierce."

the function should return the result like this function 应该返回这样的结果

[1] "a he in of of on we joy pin the the the the the the was down find hail kept list long mats than with work brown burnt grass items pearl porch straw sweet theft words woven wrote better fierce screen secret things pattered simplest"

Answer 1

one approach with base R:一种以 R 为基数的方法：

sentence <- 
"We find joy in the simplest things. He wrote down a long list of items. 
The hail pattered on the burnt brown grass. Screen the porch with woven straw mats.
The theft of the pearl pin was kept secret. Sweet words work better than fierce."

sentence |>
    strsplit('\\W+?') |> ## split at non-word characters
    unlist() |>
    (\(.) .[. != ""])() |> ## remove empty strings
    (\(.) .[order(nchar(.), .)])() |> ## sort by string length and alphabet
                            paste(collapse = ' ')

output: output：

[1] "a He in of of on We joy pin the the the the The The was down find hail kept list long mats than with work brown burnt grass items pearl porch straw Sweet theft words woven wrote better fierce Screen secret things pattered simplest"

Note that there's some perhaps unfamiliar notation like (\(.)...)() .请注意，有一些可能不熟悉的符号，例如(\(.)...)() 。 This is a shorthand for defining and executing an anonymous function:这是定义和执行匿名 function 的简写：

function(x){...} can be written as \(x){...} function(x){...}可以写成\(x){...}
(\(x){...})() defines and executes the function, where x is the incoming value if you put this construct in a ... |>... |> pipeline (\(x){...})()定义并执行 function，如果将此构造放入... |>... |>管道，其中x是传入值

Answer 2

A similar base R approach:类似的基础 R 方法：

str <- "We find joy in the simplest things. He wrote down a long list of items.
        The hail pattered on the burnt brown grass. Screen the porch with woven
        straw mats. The theft of the pearl pin was kept secret. 
        Sweet words work better than fierce."

words <- strsplit(str, "[[:punct:]]?\\s+[[:punct:]]?")[[1]]

split(words, nchar(words)) |>
  lapply(sort) |>
  unlist() |>
  paste(collapse = " ")
  
#> [1] "a He in of of on We joy pin the the the the The The was down find hail
#> kept list long mats than with work brown burnt grass items pearl porch 
#> straw Sweet theft words woven wrote better Screen secret things fierce. 
#> pattered simplest"

Answer 3

library(tokenizers)

text =  "We find joy in the simplest things. He wrote down a long list of items. The hail pattered on the burnt brown grass. Screen the porch with woven straw mats. The theft of the pearl pin was kept secret. Sweet words work better than fierce."

sort_count <- function(s){
  words <- tokenize_words(text, simplify = T)
  words[order(nchar(words), words)]
} 
 
sort_count(text)
#>  [1] "a"        "he"       "in"       "of"       "of"       "on"      
#>  [7] "we"       "joy"      "pin"      "the"      "the"      "the"     
#> [13] "the"      "the"      "the"      "was"      "down"     "find"    
#> [19] "hail"     "kept"     "list"     "long"     "mats"     "than"    
#> [25] "with"     "work"     "brown"    "burnt"    "grass"    "items"   
#> [31] "pearl"    "porch"    "straw"    "sweet"    "theft"    "words"   
#> [37] "woven"    "wrote"    "better"   "fierce"   "screen"   "secret"  
#> [43] "things"   "pattered" "simplest"

我如何在 R 中随机选择一个句子并计算每个单词的字符数并根据单词中的这些数字对文本进行排序

问题描述

3 个解决方案

解决方案1
1 2022-12-27 19:14:00

解决方案2
1 2022-12-27 19:21:58

解决方案3
1 2022-12-27 19:25:43

我如何在 R 中随机选择一个句子并计算每个单词的字符数并根据单词中的这些数字对文本进行排序

问题描述

3 个解决方案

解决方案1 1 2022-12-27 19:14:00

解决方案2 1 2022-12-27 19:21:58

解决方案3 1 2022-12-27 19:25:43

解决方案1
1 2022-12-27 19:14:00

解决方案2
1 2022-12-27 19:21:58

解决方案3
1 2022-12-27 19:25:43