简体   繁体   English

给定文本中所有单词的字母编号,并按少数字母排序

[英]letter numbers of all words in a given text and sorting by few letters to many

i need to use for example sentences in tidyverse and taking 5 sample.我需要使用 tidyverse 中的例句并抽取 5 个样本。 after taking those 5 sample i need a function that finds letter numbers of all words in that sample and sort the text according to those numbers from words with few letters to words with many letters.在拿完这 5 个样本后,我需要一个 function 来查找该样本中所有单词的字母编号,并根据这些数字对文本进行排序,从字母少的单词到字母多的单词。

You could use the stringr package:您可以使用stringr

s <- "The first worm gets snapped early. The sink is the thing in which we pile dishes. A big wet stain was on the round carpet. A fence cuts through the corner lot. Peep under the tent and see the clowns. Next Sunday is the twelfth of the month."

words <- unlist(stringr::str_extract_all(s, stringr::boundary("word")))
words[order(nchar(words))]

 [1] "A"       "A"       "is"      "in"      "we"      "on"      "is"      "of"      "The"     "The"     "the"     "big"    
[13] "wet"     "was"     "the"     "the"     "lot"     "the"     "and"     "see"     "the"     "the"     "the"     "worm"   
[25] "gets"    "sink"    "pile"    "cuts"    "Peep"    "tent"    "Next"    "first"   "early"   "thing"   "which"   "stain"  
[37] "round"   "fence"   "under"   "month"   "dishes"  "carpet"  "corner"  "clowns"  "Sunday"  "snapped" "through" "twelfth"

1. Sorted by length of the words only 1. 仅按单词长度排序

s       <- "The first worm gets snapped early. The sink is the thing in which we pile dishes. A big wet stain was on the round carpet. A fence cuts through the corner lot. Peep under the tent and see the clowns. Next Sunday is the twelfth of the month."
s_split <- s %>% str_extract_all(stringr::boundary("word")) %>% unlist()

s_split %>% 
  str_length() %>% 
  order() %>% 
  s_split[.] %>% 
  str_c(collapse = " ") %>% 
  str_to_lower()

[1] "a a is in we on is of the the the big wet was the the lot the and see the the the worm gets sink pile cuts peep tent next first early thing which stain round fence under month dishes carpet corner clowns sunday snapped through twelfth"

If you want to analyse multiple strings, use a function:如果要分析多个字符串,请使用 function:

order_by_length <- function(input) {
  
  s_split <- input %>% str_extract_all(stringr::boundary("word")) %>% unlist()
  
  s_split %>% 
    str_length() %>% 
    order() %>% 
    s_split[.] %>% 
    str_c(collapse = " ") %>% 
    str_to_lower()
  
}

string_1 <- "This is a test string"
string_2 <- "Here we have another sentence as an example"
string_3 <- "Let's demonstrate even a third string"

string_list <- list(string_1, string_2, string_3)
map(string_list, order_by_length)
[[1]]
[1] "a is this test string"

[[2]]
[1] "we as an here have another example sentence"

[[3]]
[1] "a even let's third string demonstrate"

2. Sorted first by length and then alphabetically 2. 先按长度排序,再按字母排序

Use split() to sort by length and str_sort() to sort alphabetically:使用split()按长度排序,使用str_sort()按字母排序:

order_by_length2 <- function(input) {
  
  input %>% 
    str_extract_all(stringr::boundary("word")) %>% 
    unlist() %>% 
    split(f=str_length(.)) %>% 
    map(str_sort) %>% 
    unlist(use.names = F) %>% 
    str_c(collapse = " ") %>% 
    str_to_lower()
  
}
# 1. One string
order_by_length2(s)
[1] "a a in is is of on we and big lot see the the the the the the the the the was wet cuts gets next peep pile sink tent worm early fence first month round stain thing under which carpet clowns corner dishes sunday snapped through twelfth"

# 2. Multiple strings
map(string_list, order_by_length2)
[[1]]
[1] "a is test this string"

[[2]]
[1] "an as we have here another example sentence"

[[3]]
[1] "a even let's third string demonstrate"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM