简体   繁体   中英

Ruby - Convert XML to JSON without using Gems

I recently worked on a coding exercise that involved converting XML to JSON. The sane way to do this could be to use the JSON and ActiveSupport gems as described here. That's what I'd do in production, but it doesn't make me a better coder. So, I've put together my own script, and it works, but frankly I think it's hideous. My question is, how could it be better? What types of techniques and methodologies could I use to make this script simpler, more readable, and more professional?

For reference, we start with the following input.html (clean - no edge cases):

 <html>
     <body>
          <ul>
               <li>Item One</li>
               <li>Item Two</li>
               <li>
                    <ul>
                         <li>A</li>
                         <li>B</li>
                    </ul>     
               </li>
          </ul>
     </body>
</html>

And the JSON output looks like so:

{ "html": { "body": { "ul": { "li": ["Item One", "Item Two", { "ul": { "li": ["A", "B"] }  } ]  }  }  }

And here's the script - *xml_to_json.rb*:

#!/usr/bin/env ruby

def ucode_strip(obj)
    #remove weird whitespace
    return obj.gsub(/\A[[:space:]]+|[[:space:]]+\z/, '')
end

def element?(obj)
    #returns true if text is an xml element --e.g. "<html>"
    if ucode_strip(obj) =~ /\A<.*?>/
        true
    end
end

def element(obj)
    #returns html element name --e.g. "<html>" => html, "html" => nil
    stripped = ucode_strip(obj)
    parts = stripped.split(/>/)
    return parts[0].sub(/</, '')
end

def value?(obj)
    #does the line contain information inside of tags <tag>value</tag>
    parts = obj.split(/>/)
    unless !parts[1]
        true
    end
end

def value(obj)
    #returns the value of an xml element --e.g. "<li>item</li>" => "item"
    parts = obj.split(/\</)
    parts[0]
end

def convert_file(file)

    text = File.read(file.to_s)

    lines = text.split(/\n/)

    last_tag = nil
    same_tags = nil
    multiple_values = []

    json = "{ "

    lines.each do |line|
        clean = ucode_strip(line)
        if line =~ /<.*?>/
            unless clean =~ /\A<\// #<opening tag>
                line_elements = clean.split(/>/)

                tag = "\"" + element(line) + "\"" + ':'

                if line_elements[1]
                    #there's more data in this line, not just a tag
                    unless same_tags == true
                        same_tags = true
                        json += tag + " ["
                        last_tag = element(line)
                    else
                        json += ", "
                    end

                    json += "\"" + value(line_elements[1]) + "\""
                else
                    #this line only contains an opening tag
                    same_tags = false #stop building list
                    unless element(line) == last_tag #the previous line started with the same tag
                        json += tag += " { "
                    else
                        json += ", { "
                    end
                    last_tag = tag
                end
            else #</closing tag>
                if same_tags == true
                    #we have a closing tag while building a list
                    same_tags = false #stop building list
                    json += "] } " #first close list, then bracket
                else
                    if clean =~ /#{last_tag}/
                        json += " } ] " #there's an open list we have to 
                    else
                        json += " } " #no open list, just close bracket
                    end
                end
            end
        end
    end

return json

end

input =  ARGV.first
puts convert_file(input)

As I said, this works, but I know it could be better. I realize that there's little-to-no edge case handling, but I'm much more concerned with the way I'm handling the data as a whole. Someone suggested using a ruby list as a stack to store the nested JSON, but I haven't quite figured that out yet. Any help would be much appreciated - if you've gotten this far, thanks for reading.

The convert_file method is too complex, it mix the logic of parser and builder together which make it difficult to read, reuse and extend, what i can suggest is to separate this logic out.

The code used to parse the xml file is call a parser, there are three kind of parsers as far as i know , tha SAX, DOM, StAX you can find some reference through google and see how this can be implemented.

The code used to generate json code is call a builder, it is a design pattern, you can find more info through google also.

The stack is a data structure which you may used when implementing the parser and builder.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM