简体   繁体   中英

Pretty-print to file with REXML

I am having trouble editing an XML file correctly. I want to remove certain elements and then add new ones.

<project>
    <option>
        <name>foo</name>
        <state>0</state>
    </option>
    <option>
        <name>bar</name>
        <state>foo/apple</state>
        <state>foo/orange</state>
    </option>
</project>

I want to remove the state s apple and orange and insert grape, lemon and lime. I have tried with this code:

#!/usr/bin/ruby -w
require 'fileutils'
require 'rexml/document'
require 'find'
include REXML

path = 'C:\Users\GustavWi\Documents\Gustav\help.xml'
xmlfile = File.new(path)
xmldoc = Document.new(xmlfile)
str_new_elements =["grape","lemon","lime"]
xmldoc.elements.each("project/option") do |parent| 
    if parent.elements['name'].text == 'bar'
        parent.elements.each do |element|
        str = element.text.split('/')
            if str[0] == 'foo'
            parent.delete_element(element)
            end
        end
        str_new_elements.each do |dir|
            state = Element.new("state")
            state.text = dir
            parent.add_element(state)
        end
    end
end

File.open(path,"w") do |data|
        xmldoc.write(data)
end

The problem is that the output is:

<project>
    <option>
        <name>foo</name>
        <state>0</state>
    </option>
    <option>
        <name>bar</name>


    <state>grape</state><state>lemon</state><state>lime</state></option>
</project>

The problem is the empty lines and the missing indentation of the new elements.

I am using Ruby 1.8.6 so that might be a problem but I have not seen any info that this is a problem in 1.8.6.

Almost the same problem can be seen in the book "Programming Ruby The Pragmatic Programmers' Guide" on page 726.

I think the issue here is XML text nodes. Whitespace isn't actually ignored by REXML so in between your elements you have text nodes that are causing the output to appear oddly formatted.

For example, if you look at parent.texts inside your loop you'll see

["\n\t\t", "\n\t\t", "\n\t\t", "\n\t"]

which are the indentations between your elements. When you call delete_element , REXML doesn't touch the surrounding text nodes, which causes the empty lines to appear in the output. When you call add_element , REXML inserts the element after the last text node ie right before the closing </option> which is why your new elements appear at the wrong indentation level.

I see two solutions:

  1. Monkey around with the text nodes before output to make sure that the indentation is nice. This looks to be pretty difficult to do with REXML since it tries pretty hard to keep text nodes out of the way.
  2. If you don't care about whitespace, let REXML do the indentation for you: xmldoc.write(data, 4) . However this also adds whitespace to the text nodes of each element ie "bar" becomes "\\n bar\\n " .

Frankly REXML is not a very well-designed library. It clearly can't decide how it wants to treat whitespace, for one. Have you tried Nokogiri?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM