简体   繁体   English

用多个定界符分割字符串

[英]Split a string by multiple delimiters

I want to split a string by whitespaces, commas, and dots. 我想按空格,逗号和点分隔字符串。 Given this input : 鉴于此输入:

"hello this is a hello, allright this is a hello."

I want to output: 我想输出:

hello 3
a 2
is 2
this 2
allright 1

I tried: 我试过了:

puts "Enter string "
text=gets.chomp
frequencies=Hash.new(0)
delimiters = [',', ' ', "."]
words = text.split(Regexp.union(delimiters))
words.each { |word| frequencies[word] +=1}
frequencies=frequencies.sort_by {|a,b| b}
frequencies.reverse!
frequencies.each { |wor,freq| puts "#{wor} #{freq}"}

This outputs: 输出:

hello 3
a 2
is 2
this 2
allright 1
 1

I do not want the last line of the output. 我不希望输出的最后一行。 It considers the space as a word too. 它也将空格视为一个单词。 This may be because there were consecutive delimiters ( , , & , " " ). 这可能是因为有连续的分隔符( ,&" " )。

Use a regex: 使用正则表达式:

str = 'hello this is a hello, allright this is a hello.'
str.split(/[.,\s]+/)
# => ["hello", "this", "is", "a", "hello", "allright", "this", "is", "a", "hello"]

This allows you to split a string by any of the three delimiters you've requested. 这样,您就可以通过请求的三个定界符中的任何一个来分割字符串。

The stop and comma are self-explanatory, and the \\s refers to whitespace. 停止符和逗号是不言自明的, \\s表示空白。 The + means we match one or more of these, and means we avoid empty strings in the case of 2+ of these characters in sequence. +表示我们匹配其中一个或多个,并且表示在顺序中有2+个这些字符的情况下避免空字符串。

You might find the explanation provided by Regex101 to be handy, available here: https://regex101.com/r/r4M7KQ/3 . 您可能会发现Regex101提供的说明很方便,可以在以下位置找到: https ://regex101.com/r/r4M7KQ/3。


Edit: for bonus points, here's a nice way to get the word counts using each_with_object :) 编辑:对于加分,这是一种使用each_with_object获得字数的好方法:)

str.split(/[.,\s]+/).each_with_object(Hash.new(0)) { |word, counter| counter[word] += 1 }
# => {"hello"=>3, "this"=>2, "is"=>2, "a"=>2, "allright"=>1}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM