[英]Split a string by multiple delimiters
I want to split a string by whitespaces, commas, and dots. 我想按空格,逗号和点分隔字符串。 Given this input :
鉴于此输入:
"hello this is a hello, allright this is a hello."
I want to output: 我想输出:
hello 3
a 2
is 2
this 2
allright 1
I tried: 我试过了:
puts "Enter string "
text=gets.chomp
frequencies=Hash.new(0)
delimiters = [',', ' ', "."]
words = text.split(Regexp.union(delimiters))
words.each { |word| frequencies[word] +=1}
frequencies=frequencies.sort_by {|a,b| b}
frequencies.reverse!
frequencies.each { |wor,freq| puts "#{wor} #{freq}"}
This outputs: 输出:
hello 3
a 2
is 2
this 2
allright 1
1
I do not want the last line of the output. 我不希望输出的最后一行。 It considers the space as a word too.
它也将空格视为一个单词。 This may be because there were consecutive delimiters (
,
, &
, " "
). 这可能是因为有连续的分隔符(
,
, &
, " "
)。
Use a regex: 使用正则表达式:
str = 'hello this is a hello, allright this is a hello.'
str.split(/[.,\s]+/)
# => ["hello", "this", "is", "a", "hello", "allright", "this", "is", "a", "hello"]
This allows you to split a string by any of the three delimiters you've requested. 这样,您就可以通过请求的三个定界符中的任何一个来分割字符串。
The stop and comma are self-explanatory, and the \\s
refers to whitespace. 停止符和逗号是不言自明的,
\\s
表示空白。 The +
means we match one or more of these, and means we avoid empty strings in the case of 2+ of these characters in sequence. +
表示我们匹配其中一个或多个,并且表示在顺序中有2+个这些字符的情况下避免空字符串。
You might find the explanation provided by Regex101 to be handy, available here: https://regex101.com/r/r4M7KQ/3 . 您可能会发现Regex101提供的说明很方便,可以在以下位置找到: https ://regex101.com/r/r4M7KQ/3。
Edit: for bonus points, here's a nice way to get the word counts using each_with_object
:) 编辑:对于加分,这是一种使用
each_with_object
获得字数的好方法:)
str.split(/[.,\s]+/).each_with_object(Hash.new(0)) { |word, counter| counter[word] += 1 }
# => {"hello"=>3, "this"=>2, "is"=>2, "a"=>2, "allright"=>1}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.