简体   繁体   中英

String splitting with unknown punctuation in Ruby

I am building an application that downloads sentences and parses them for a word game. I don't know in advance what punctuation the text will contain.

I'd like to be able to split up the sentence/s, examine them for part of speech tag, and if the correct tag is found, replace it with " " , and rejoin them back in order.

text = "some string, with punctuation- for example: things I don't know about, that may or may not have     whitespaces and random characters % !!"

How can I split it into an array so that I can pass the parser over each word, and rejoin them in order , bearing in mind that string.split(//) seems to need to know what punctuation I'm looking for?

split is useful when you can more easily describe the delimiters than the parts to be extracted. In your case, you can more easily describe the parts to be extracted rather than the delimiters, in which case scan is more suited. It is a wrong decision to use split . You should you scan .

text.scan(/[\w']+/)
# => ["some", "string", "with", "punctuation", "for", "example", "things", "I", "don't", "know", "about", "that", "may", "or", "may", "not", "have", "whitespaces", "and", "random", "characters"]

If you want to replace the matches, there is even more reason to not use split . In that case, you should use gsub .

text.gsub(/[\w']+/) do |word|
 if word.is_of_certain_part_of_speech?
   "___"  # Replace it with `"___"`.
 else
   word   # Put back the original word.
 end
end

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM