简体   繁体   中英

How do I replace words in a string based on words in an Array in Ruby?

how would I do the following? I tried doing this gsub but I can't figure out what really efficient if strings_to_highlight array is large. Cheers!

  string = "Roses are red, violets are blue"

  strings_to_highlight = ['red', 'blue']

  # ALGORITHM HERE

  resulting_string = "Roses are (red), violets are (blue)"

Regexp has a helpful union function for combining regular expressions together. Stick with regexp until you have a performance problem:

string = "Roses are red, violets are blue"
strings_to_highlight = ['red', 'blue']

def highlight(str, words)
  matcher = Regexp.union words.map { |w| /\b(#{Regexp.escape(w)})\b/ }
  str.gsub(matcher) { |word| "(#{word})" }
end

puts highlight(string, strings_to_highlight)
strings_to_highlight = ['red', 'blue']
string = "Roses are red, violets are blue"

strings_to_highlight.each { |i| string.gsub!(/\b#{i}\b/, "(#{i})")}

I suggest using the form of String#gsub that employs a hash for making substitutions.

strings_to_highlight = ['red', 'blue']

First construct the hash.

h = strings_to_highlight.each_with_object({}) do |s,h|
  h[s] = "(#{s})"
  ss = "#{s[0].swapcase}#{s[1..-1]}"
  h[ss] = "(#{ss})"
end
  #=> {"red"=>"(red)", "Red"=>"(Red)", "Blue"=>"(Blue)", "blue"=>"(blue)"} 

Next define a default proc for it:

h.default_proc = ->(h,k) { k }

so that if h does not have a key k , h[k] returns k (eg, h["cat"] #=> "cat" ).

Ready to go!

string = "Roses are Red, violets are blue"

string.gsub(/[[[:alpha:]]]+/, h)
 => "Roses are (Red), violets are (blue)"

This should be relatively efficient as only one pass through the string is needed and hash lookups are very fast.

I'd use:

string = "Roses are red, violets are blue"
strings_to_highlight = ['red', 'blue']

string.gsub(/\b(#{Regexp.union(strings_to_highlight).source})\b/) { |s| "(#{s})" } # => "Roses are (red), violets are (blue)"

Here's how it breaks down:

/\b(#{Regexp.union(strings_to_highlight).source})\b/ # => /\b(red|blue)\b/

It's important to use source when embedding a pattern. Without it results in:

/\b(#{Regexp.union(strings_to_highlight)})\b/ # => /\b((?-mix:red|blue))\b/

and that (?-mix:...) part can cause problems if you don't understand what it means in regex-ese. The Regexp documentation explains the flags but failing to do this can lead to a really hard to diagnose bug if you're not aware of the problem.

\\b tells the engine to match words, not substrings. Without that you could end up with:

string = "Fred, bluette"
strings_to_highlight = ['red', 'blue']
string.gsub(/(#{Regexp.union(strings_to_highlight).source})/) { |s| "(#{s})" } 
# => "F(red), (blue)tte"

Using a block with gsub allows us to perform calculations on the matched values.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM