简体   繁体   English

如何使用Ruby替换字符串中模式的每次出现?

[英]How to replace every occurrence of a pattern in a string using Ruby?

I have an XML file which is too big. 我有一个太大的XML文件。 To make it smaller, I want to replace all tags and attribute names with shorter versions of the same thing. 为了使其更小,我想用同一事物的较短版本替换所有标记和属性名称。

So, I implemented this: 因此,我实现了这一点:

string.gsub!(/<(\w+) /) do |match|
    case match
    when 'Image' then 'Img'
    when 'Text'  then 'Txt'
    end
end

puts string

which deletes all opening tags but does not do much else. 该操作会删除所有开始标签,但不会执行其他任何操作。

What am I doing wrong here? 我在这里做错了什么?

Here's another way: 这是另一种方式:

class String
  def minimize_tags!
    {"image" => "img", "text" => "txt"}.each do |from,to|
      gsub!(/<#{from}\b/i,"<#{to}")
      gsub!(/<\/#{from}>/i,"<\/#{to}>")
    end
    self
  end
end

This will probably be a little easier to maintain, since the replacement patterns are all in one place. 由于替换模式都集中在一个地方,因此维护起来可能会容易一些。 And on strings of any significant size, it may be a lot faster than Kevin's way. 在任何大小的弦上,它可能都比Kevin的方法快得多。 I did a quick speed test of these two methods using the HTML source of this stackoverflow page itself as the test string, and my way was about 6x faster... 我使用此stackoverflow页面本身的HTML源作为测试字符串对这两种方法进行了快速测试,我的方法快了大约6倍...

Here's the beauty of using a parser such as Nokogiri : 这是使用诸如Nokogiri之类的解析器的好处

This lets you manipulate selected tags (nodes) and their attributes: 这使您可以操纵选定的标签(节点)及其属性:

require 'nokogiri'

xml = <<EOT
<xml>
  <Image ImagePath="path/to/image">image comment</Image>
  <Text TextFont="courier" TextSize="9">this is the text</Text>
</xml>
EOT

doc = Nokogiri::XML(xml)
doc.search('Image').each do |n| 
  n.name = 'img' 
  n.attributes['ImagePath'].name = 'path'
end
doc.search('Text').each do |n| 
  n.name = 'txt'
  n.attributes['TextFont'].name = 'font'
  n.attributes['TextSize'].name = 'size'
end
print doc.to_xml
# >> <?xml version="1.0"?>
# >> <xml>
# >>   <img path="path/to/image">image comment</img>
# >>   <txt font="courier" size="9">this is the text</txt>
# >> </xml>

If you need to iterate through every node, maybe to do a universal transformation on the tag-name, you can use doc.search('*').each . 如果您需要遍历每个节点,也许要对标记名进行通用转换,则可以使用doc.search('*').each That would be slower than searching for individual tags, but might result in less code if you need to change every tag. 这将比搜索单个标签要慢,但是如果您需要更改每个标签,则可能会减少代码量。

The nice thing about using a parser is it'll work even if the layout of the XML changes since it doesn't care about whitespace, and will work even if attribute order changes, making your code more robust. 使用解析器的好处是,即使XML的布局发生变化(因为它不关心空格)也可以使用,并且即使属性顺序发生变化也可以使用,从而使您的代码更加健壮。

Try this: 尝试这个:

string.gsub!(/(<\/?)(\w+)/) do |match|
  tag_mark = $1
  case $2
  when /^image$/i
    "#{tag_mark}Img"
  when /^text$/i
    "#{tag_mark}Txt"
  else
    match
  end
end  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM