简体   繁体   中英

Replace with multiple patterns mutually exclusively

I have the following text:

a phrase whith length one, which is "uno"

Using the following dictionary,

1) phrase --- frase
2) a phrase --- una frase
3) one --- uno
4) uno --- one

I'm trying to replace the occurrences of the dictionary items in the text. The desired output is:

[a phrase|una frase] whith length [one|uno], which is "[uno|one]"

I've done this:

text = %(a phrase whith length one, which is "uno")
dictionary.each do |original, translation|
  text.gsub! original, "[#{original}|#{translation}]"
end

This snippet outputs the following for each dictionary word:

1) a [phrase|frase] whith length one, which is "uno"
2) a [phrase|frase] whith length one, which is "uno"
3) a [phrase|frase] whith length [one|uno], which is "uno"
3) a [phrase|frase] whith length [one|[uno|one]], which is "[uno|one]"

I see two problems here:

  • The word phrase is being replaced instead of a phrase . I think that this can be fixed by sorting the dictionary by length, giving priority to longer terms.
  • The already replaced words are being re-replaced, like uno in [one|uno] . I thought of using some sort of regular expression list (with Regex::union ), but I don't know how efficient and clean it'll be.

Any ideas?

To solve your second problem, you have to replace in a single pass.

Convert the dictionary into a hash with the key-value pairs in the order you mention (sorted by length, perhaps).

dictionary = {
  "a phrase" => "[a phrase|una frase]",
  "phrase" => "[phrase|frase]",
  "one" => "[one|uno]",
  "uno" => "[uno|one]",
}

Then replace all in a single pass.

text.gsub(Regexp.union(*dictionary.keys.map{|w| "\b#{w}\b"}), dictionary)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM