I recently worked on a coding exercise that involved converting XML to JSON. The sane way to do this could be to use the JSON and ActiveSupport gems as described here. That's what I'd do in production, but it doesn't make me a better coder. So, I've put together my own script, and it works, but frankly I think it's hideous. My question is, how could it be better? What types of techniques and methodologies could I use to make this script simpler, more readable, and more professional?
For reference, we start with the following input.html (clean - no edge cases):
<html>
<body>
<ul>
<li>Item One</li>
<li>Item Two</li>
<li>
<ul>
<li>A</li>
<li>B</li>
</ul>
</li>
</ul>
</body>
</html>
And the JSON output looks like so:
{ "html": { "body": { "ul": { "li": ["Item One", "Item Two", { "ul": { "li": ["A", "B"] } } ] } } }
And here's the script - *xml_to_json.rb*:
#!/usr/bin/env ruby
def ucode_strip(obj)
#remove weird whitespace
return obj.gsub(/\A[[:space:]]+|[[:space:]]+\z/, '')
end
def element?(obj)
#returns true if text is an xml element --e.g. "<html>"
if ucode_strip(obj) =~ /\A<.*?>/
true
end
end
def element(obj)
#returns html element name --e.g. "<html>" => html, "html" => nil
stripped = ucode_strip(obj)
parts = stripped.split(/>/)
return parts[0].sub(/</, '')
end
def value?(obj)
#does the line contain information inside of tags <tag>value</tag>
parts = obj.split(/>/)
unless !parts[1]
true
end
end
def value(obj)
#returns the value of an xml element --e.g. "<li>item</li>" => "item"
parts = obj.split(/\</)
parts[0]
end
def convert_file(file)
text = File.read(file.to_s)
lines = text.split(/\n/)
last_tag = nil
same_tags = nil
multiple_values = []
json = "{ "
lines.each do |line|
clean = ucode_strip(line)
if line =~ /<.*?>/
unless clean =~ /\A<\// #<opening tag>
line_elements = clean.split(/>/)
tag = "\"" + element(line) + "\"" + ':'
if line_elements[1]
#there's more data in this line, not just a tag
unless same_tags == true
same_tags = true
json += tag + " ["
last_tag = element(line)
else
json += ", "
end
json += "\"" + value(line_elements[1]) + "\""
else
#this line only contains an opening tag
same_tags = false #stop building list
unless element(line) == last_tag #the previous line started with the same tag
json += tag += " { "
else
json += ", { "
end
last_tag = tag
end
else #</closing tag>
if same_tags == true
#we have a closing tag while building a list
same_tags = false #stop building list
json += "] } " #first close list, then bracket
else
if clean =~ /#{last_tag}/
json += " } ] " #there's an open list we have to
else
json += " } " #no open list, just close bracket
end
end
end
end
end
return json
end
input = ARGV.first
puts convert_file(input)
As I said, this works, but I know it could be better. I realize that there's little-to-no edge case handling, but I'm much more concerned with the way I'm handling the data as a whole. Someone suggested using a ruby list as a stack to store the nested JSON, but I haven't quite figured that out yet. Any help would be much appreciated - if you've gotten this far, thanks for reading.
The convert_file method is too complex, it mix the logic of parser and builder together which make it difficult to read, reuse and extend, what i can suggest is to separate this logic out.
The code used to parse the xml file is call a parser, there are three kind of parsers as far as i know , tha SAX, DOM, StAX you can find some reference through google and see how this can be implemented.
The code used to generate json code is call a builder, it is a design pattern, you can find more info through google also.
The stack is a data structure which you may used when implementing the parser and builder.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.