简体   繁体   中英

Using select rather than gsub to avoid multiple regex evaluations in Ruby

Here is one output that requires multiple regex evaluations but gets what I want to do done (remove everything except the text).

words = IO.read("file.txt").
gsub(/\s/, ""). # delete white spaces
gsub(".",""). # delete periods
gsub(",",""). # delete commas
gsub("?","") # delete Q marks
puts words
# output
#      WheninthecourseofhumaneventsitbecomesnecessaryIwanttobelieveyoureallyIdobutwhoamItoblameWhenthefactsarecountedthenumberswillbereportedLotsoflaughsCharlieIthinkIheardthatonetentimesbefore

Looking at this post - Ruby gsub : is there a better way - I figured I would try to do a match to accomplish the same result without multiple regex evaluations. But I don't get the same output.

words = IO.read("file.txt").
match(/(\w*)+/)
puts words
# output - this only gets the first word
# When

And this only gets the first sentence:

words = IO.read("file.txt").
match(/(...*)+/)
puts words

# output - this only gets the first sentence
# When in the course of human events it becomes necessary.

Any suggestions on getting the same output (including stripping out white spaces and non-word characters) on a match rather than gsub?

You can do what you want in one gsub operation:

s = 'When in the course of human events it becomes necessary.'
s.gsub /[\s.,?]/, ''
# => "Wheninthecourseofhumaneventsitbecomesnecessary"

You don't need multiple regex evaluations for this.

str = "# output - this only gets the first sentence
# When in the course of human events it becomes necessary."
p str.gsub(/\W/, "")
#=>"outputthisonlygetsthefirstsentenceWheninthecourseofhumaneventsitbecomesnecessary"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM