简体   繁体   中英

Splitting a string into words and punctuation with Ruby

I'm working in Ruby and I want to split a string and its punctuation into an array, but I want to consider apostrophes and hyphens as parts of words. For example,

s = "here...is a     happy-go-lucky string that I'm writing"

should become

["here", "...", "is", "a", "happy-go-lucky", "string", "that", "I'm", "writing"].

The closest I've gotten is still inadequate because it doesn't properly consider hyphens and apostrophes as part of the word.

This is the closest I've gotten so far:

s.scan(/\w+|\W+/).select {|x| x.match(/\S/)}

which yields

["here", "...", "is", "a", "happy", "-", "go", "-", "lucky", "string", "that", "I", "'", "m", "writing"]

.

You can try the following:

s.scan(/[\w'-]+|[[:punct:]]+/)
#=> ["here", "...", "is", "a", "happy-go-lucky", "string", "that", "I'm", "writing"]

You were close:

s.scan(/[\w'-]+|[.,!?]+/)

The idea is we match either words with possibly ' / - in them or punctuation characters.

After nearly giving up then tinkering some more, I appear to have solved the puzzle. This seems to work: s.scan(/[\\w'-]+|\\W+/).select {|x| x.match(/\\S/)} s.scan(/[\\w'-]+|\\W+/).select {|x| x.match(/\\S/)} . It yields ["here", "...", "is", "a", "happy-go-lucky", "string", "that", "I'm", "writing"] .

Is there an even cleaner way to do it though, without having to use #select ?

Use the split method.

Example:

str = "word, anotherWord, foo"
puts str.split(",")

It returns

word
anotherWord
foo

Hope it works for you!

Also you can chek this http://ruby.about.com/od/advancedruby/a/split.htm

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM