简体   繁体   English

我如何在 R 中随机选择一个句子并计算每个单词的字符数并根据单词中的这些数字对文本进行排序

[英]How do I take a random of sentence in R and count the number of characters per word and sorts the text according to those numbers from words

First, i want to select 5 or 6 sentences randomly, and after that, write a function that finds the letter numbers of all words in a given text and sorts the text according to those numbers from words with few letters to words with many letters.首先,我想 select 随机 5 或 6 个句子,然后写一个 function 找到给定文本中所有单词的字母编号,并根据这些数字从字母少的单词到字母多的单词对文本进行排序。 Sort the words containing the same number of letters alphabetically.按字母顺序对包含相同数量字母的单词进行排序。

[1] "We find joy in the simplest things. He wrote down a long list of items. The hail pattered on the burnt brown grass. Screen the porch with woven straw mats. The theft of the pearl pin was kept secret. Sweet words work better than fierce." 

the function should return the result like this function 应该返回这样的结果

[1] "a he in of of on we joy pin the the the the the the was down find hail kept list long mats than with work brown burnt grass items pearl porch straw sweet theft words woven wrote better fierce screen secret things pattered simplest" 

one approach with base R:一种以 R 为基数的方法:

sentence <- 
"We find joy in the simplest things. He wrote down a long list of items. 
The hail pattered on the burnt brown grass. Screen the porch with woven straw mats.
The theft of the pearl pin was kept secret. Sweet words work better than fierce."

sentence |>
    strsplit('\\W+?') |> ## split at non-word characters
    unlist() |>
    (\(.) .[. != ""])() |> ## remove empty strings
    (\(.) .[order(nchar(.), .)])() |> ## sort by string length and alphabet
                            paste(collapse = ' ')

output: output:

[1] "a He in of of on We joy pin the the the the The The was down find hail kept list long mats than with work brown burnt grass items pearl porch straw Sweet theft words woven wrote better fierce Screen secret things pattered simplest"

Note that there's some perhaps unfamiliar notation like (\(.)...)() .请注意,有一些可能不熟悉的符号,例如(\(.)...)() This is a shorthand for defining and executing an anonymous function:这是定义执行匿名 function 的简写:

  • function(x){...} can be written as \(x){...} function(x){...}可以写成\(x){...}
  • (\(x){...})() defines and executes the function, where x is the incoming value if you put this construct in a ... |>... |> pipeline (\(x){...})()定义并执行 function,如果将此构造放入... |>... |>管道,其中x是传入值

A similar base R approach:类似的基础 R 方法:

str <- "We find joy in the simplest things. He wrote down a long list of items.
        The hail pattered on the burnt brown grass. Screen the porch with woven
        straw mats. The theft of the pearl pin was kept secret. 
        Sweet words work better than fierce."

words <- strsplit(str, "[[:punct:]]?\\s+[[:punct:]]?")[[1]]

split(words, nchar(words)) |>
  lapply(sort) |>
  unlist() |>
  paste(collapse = " ")
  
#> [1] "a He in of of on We joy pin the the the the The The was down find hail
#> kept list long mats than with work brown burnt grass items pearl porch 
#> straw Sweet theft words woven wrote better Screen secret things fierce. 
#> pattered simplest"
library(tokenizers)

text =  "We find joy in the simplest things. He wrote down a long list of items. The hail pattered on the burnt brown grass. Screen the porch with woven straw mats. The theft of the pearl pin was kept secret. Sweet words work better than fierce."

sort_count <- function(s){
  words <- tokenize_words(text, simplify = T)
  words[order(nchar(words), words)]
} 
 
sort_count(text)
#>  [1] "a"        "he"       "in"       "of"       "of"       "on"      
#>  [7] "we"       "joy"      "pin"      "the"      "the"      "the"     
#> [13] "the"      "the"      "the"      "was"      "down"     "find"    
#> [19] "hail"     "kept"     "list"     "long"     "mats"     "than"    
#> [25] "with"     "work"     "brown"    "burnt"    "grass"    "items"   
#> [31] "pearl"    "porch"    "straw"    "sweet"    "theft"    "words"   
#> [37] "woven"    "wrote"    "better"   "fierce"   "screen"   "secret"  
#> [43] "things"   "pattered" "simplest"

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何获取 R 中的单词列表并计算每个单词的字符数并将计数频率存储在数组中? - How do I take a list of words in R and count the number of characters per word and store the frequency of counts in an array? 如何从 R 中的每个句子中随机选择一个字母、2 个字母、3 个字母、...、最多字母的单词? - How do I choose a random letter, 2 letters, 3 letters, ..., words with the most letters from each sentence in R? R计算列表中单词在句子中出现的频率 - R count how often words from a list appear in a sentence 如何计算文本(字符串)中的单词数? - How do I count the number of words in a text (string)? 如何计算 R 数据框中提到的列表中的单词数 - How do I count the number of words from a list mentioned in a data frame in R 如何从 R 中的列中获取特定数字? - How do I take specific numbers from a column in R? 从数字序列中,如何找到比R中的特定随机数小的立即数(和更大的立即数)? - From a sequence of numbers, how do I find an immediate smaller (and an immediate bigger) number than a particular random number, In R? 使用R如何保留带有关键字的句子 - using R How do I retain a sentence with a key word 计算文本条目(R)中列表中单词的出现次数 - Count Number of Occurrences of words from a list in text entries (R) R - 从单词列表中计算字符串中的完全匹配,然后使用每个单词的分数计算总体情绪 - R - Count exact matches in string from list of words, then calculate overall sentiment using score per word
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM