简体   繁体   English

如何替换字符串中除某些字符以外的所有字符(在Ruby中)

[英]How to substitute all characters in a string except for some (in Ruby)

I'm having some trouble trying to find an appropriate method for string substitution. 我在寻找合适的字符串替换方法时遇到了麻烦。 I would like to replace every character in a string 'except' for a selection of words or set of string (provided in an array). 我想将字符串“ except”中的每个字符替换为一组单词或一组字符串(在数组中提供)。 I know there's a gsub method, but I guess what I'm trying to achieve is its reverse. 我知道有一个gsub方法,但是我想我要实现的是相反的方法。 For example... 例如...

My string: "Part of this string needs to be substituted" 我的字符串: "Part of this string needs to be substituted"

Keywords: ["this string", "substituted"] 关键字: ["this string", "substituted"]

Desired output: "**** ** this string ***** ** ** substituted" 所需的输出: "**** ** this string ***** ** ** substituted"

ps. PS。 It's my first question ever, so your help will be greatly appreciated! 这是我的第一个问题,非常感谢您的帮助!

Here's a different approach. 这是另一种方法。 First, do the reverse of what you ultimately want: redact what you want to keep. 首先,与您最终想要的相反:编辑您想要保留的内容。 Then compare this redacted string to your original character by character, and if the characters are the same, redact, and if they are not, keep the original. 然后将此编辑过的字符串与每个字符与原始字符进行比较,如果字符相同,则编辑,如果不相同,则保留原始字符。

class String
  # Returns a string with all words except those passed in as keepers
  # redacted.
  #
  #      "Part of this string needs to be substituted".gsub_except(["this string", "substituted"], '*')
  #      # => "**** ** this string ***** ** ** substituted"
  def gsub_except keep, mark
    reverse_keep = self.dup
    keep.each_with_object(Hash.new(0)) { |e, a| a[e] = mark * e.length }
             .each { |word, redacted| reverse_keep.gsub! word, redacted }
    reverse_keep.chars.zip(self.chars).map do |redacted, original|
      redacted == original && original != ' ' ?  mark : original
    end.join
  end
end

You can use something like: 您可以使用类似:

str="Part of this string needs to be substituted"
keep = ["this","string", "substituted"]

str.split(" ").map{|word| keep.include?(word) ? word : word.split("").map{|w| "*"}.join}.join(" ")

but this will work only to keep words, not phrases. 但这只会保留单词而不是短语。

This might be a little more understandable than my last answer: 这可能比我的上一个答案更容易理解:

s = "Part of this string needs to be substituted"
k = ["this string", "substituted"]

tmp = s
for(key in k) {
    tmp = tmp.replace(k[key], function(x){ return "*".repeat(x.length)})
}

res = s.split("")
for(charIdx in s) {
    if(tmp[charIdx] != "*" && tmp[charIdx] != " ") {
        res[charIdx] = "*"
    } else {
        res[charIdx] = s.charAt(charIdx)
    }
}
var finalResult = res.join("")

Explanation: 说明:

This goes off of my previous idea about using where the keywords are in order to replace portions of the string with stars. 这与我以前关于使用关键字所在位置以便用星号替换字符串部分的想法背道而驰。 First off: 首先:

For each of the keywords we replace it with stars, of the same length as it. 对于每个关键字,我们用与它长度相同的星号代替它。 So: 所以:

s.replace("this string", function(x){
    return "*".repeat(x.length)
}

replaces the portion of s that matches "this string" with x.length * 's 用x.length *替换s中与“此字符串”匹配的部分

We do this for each key, for completeness, you should make sure that the replace is global and not just the first match found. 为了确保完整性,我们对每个键都执行此操作,您应确保替换是全局的,而不仅仅是找到的第一个匹配项。 /this string/g , I didn't do it in the answer, but I think you should be able to figure out how to use new RegExp by yourself. /this string/g ,我没有在答案中这样做,但我认为您应该能够自己弄清楚如何使用new RegExp

Next up, we split a copy of the original string into an array. 接下来,我们将原始字符串的副本拆分为一个数组。 If you're a visual person, it should make sense to think of this as a weird sort of character addition: 如果您是一个有视觉见识的人,那么可以认为这是一种怪异的字符添加方式:

"Part of this string needs to be substituted"
"Part of *********** needs to be substituted" +
---------------------------------------------
 **** ** this string ***** ** ** ***********

is what we're going for. 这就是我们要的。 So if our tmp variable has stars, then we want to bring over the original string, and otherwise we want to replace the character with a * 因此,如果我们的tmp变量中有星号,那么我们想带入原始字符串,否则我们要用*替换字符

This is easily done with an if statement. 使用if语句很容易做到这一点。 And to make it like your example in the question, we also bring over the original character if it's a space. 为了使它像问题中的示例一样,如果它是空格,我们还将原始字符带入。 Lastly, we join the array back into a string via .join("") so that you can work with a string again. 最后,我们通过.join("")将数组重新连接为字符串,以便您可以再次使用字符串。

Makes sense? 说得通?

You can use the following approach: collect the substrings that you need to turn into asterisks, and then perform this replacement: 您可以使用以下方法:收集需要变成星号的子字符串,然后执行此替换:

str="Part of this string needs to be substituted"
arr = ["this string", "substituted"]

arr_to_remove = str.split(Regexp.new("\\b(?:" + arr.map { |x| Regexp.escape(x) }.join('|') + ")\\b|\\s+")).reject { |s| s.empty? }

arr_to_remove.each do |s|
    str = str.gsub(s, "*" * s.length)
end
puts str

Output of the demo program : 演示程序的输出:

**** ** this string ***** ** ** substituted

You can do that using the form of String#split that uses a regex with a capture group. 您可以使用String#split的形式执行此操作,该形式使用带有捕获组的正则表达式。

Code

def sub_some(str, keywords)
  str.split(/(#{keywords.join('|')})/)
     .map {|s| keywords.include?(s) ? s : s.gsub(/./) {|c| (c==' ') ? c : '*'}}
     .join
end

Example

str = "Part of this string needs to be substituted"
keywords = ["this string", "substituted"]
sub_some(str, keywords)
  #=> "**** ** this string ***** ** ** substituted" 

Explanation 说明

r = /(#{keywords.join('|')})/
  #=> /(this string|substituted)/ 
a = str.split(r)
  #=> ["Part of ", "this string", " needs to be ", "substituted"] 
e = a.map
  #=> #<Enumerator: ["Part of ", "this string", " needs to be ",
  #     "substituted"]:map> 

s = e.next
  #=> "Part of " 
keywords.include?(s) ? s : s.gsub(/./) { |c| (c==' ') ? c : '*' }
  #=> s.gsub(/./) { |c| (c==' ') ? c : '*' }
  #=> "Part of "gsub(/./) { |c| (c==' ') ? c : '*' }
  #=> "**** ** " 

s = e.next
keywords.include?(s) ? s : s.gsub(/./) { |c| (c==' ') ? c : '*' }
  #=> "this string" 
keywords.include?(s) ? s : s.gsub(/./) { |c| (c==' ') ? c : '*' }
  #=> s
  #=> "this string" 

and so on... Lastly, 等等...最后,

["**** ** ", "this string", " ***** ** ** ", "substituted"].join('|') 
  #=> "**** ** this string ***** ** ** substituted" 

Note that, prior to v.1.9.3, Enumerable#map did not return an enumerator when no block is given. 请注意,在v.1.9.3之前,如果未指定任何块,则Enumerable#map不会返回枚举数。 The calculations are the same, however. 但是,计算结果相同。

str = "Part of this string needs to be substituted"
keywords = ["this string", "substituted"]

pattern = /(#{keywords.join('|')})/

str.split(pattern).map {|i| keywords.include?(i) ? i : i.gsub(/\S/,"*")}.join
#=> "**** ** this string ***** ** ** substituted"

A more readable version of the same code 相同代码的可读性更高的版本

str = "Part of this string needs to be substituted"
keywords = ["this string", "substituted"]

#Use regexp pattern to split string around keywords.
pattern = /(#{keywords.join('|')})/ #pattern => /(this string|substituted)/
str = str.split(pattern) #=> ["Part of ", "this string", " needs to be ", "substituted"]

redacted = str.map do |i|
    if keywords.include?(i)
        i
    else
        i.gsub(/\S/,"*") # replace all non-whitespace characters with "*"
    end
end      
# redacted => ["**** **", "this string", "***** ** **", "substituted"]
redacted.join

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM