简体   繁体   English

Ruby-不使用Gems将XML转换为JSON

[英]Ruby - Convert XML to JSON without using Gems

I recently worked on a coding exercise that involved converting XML to JSON. 我最近进行了一次编码练习,涉及将XML转换为JSON。 The sane way to do this could be to use the JSON and ActiveSupport gems as described here. 执行此操作的理智方法可能是使用JSON和ActiveSupport gem ,如此处所述。 That's what I'd do in production, but it doesn't make me a better coder. 那是我在生产中要做的,但这并不能使我成为更好的编码器。 So, I've put together my own script, and it works, but frankly I think it's hideous. 因此,我整理了自己的脚本,它可以正常工作,但是坦率地说,我认为它很丑陋。 My question is, how could it be better? 我的问题是,怎么会更好? What types of techniques and methodologies could I use to make this script simpler, more readable, and more professional? 我可以使用什么类型的技术和方法来使此脚本更简单,更易读和更专业?

For reference, we start with the following input.html (clean - no edge cases): 作为参考,我们从以下input.html (干净-无边缘情况)开始:

 <html>
     <body>
          <ul>
               <li>Item One</li>
               <li>Item Two</li>
               <li>
                    <ul>
                         <li>A</li>
                         <li>B</li>
                    </ul>     
               </li>
          </ul>
     </body>
</html>

And the JSON output looks like so: JSON输出如下所示:

{ "html": { "body": { "ul": { "li": ["Item One", "Item Two", { "ul": { "li": ["A", "B"] }  } ]  }  }  }

And here's the script - *xml_to_json.rb*: 这是脚本-* xml_to_json.rb *:

#!/usr/bin/env ruby

def ucode_strip(obj)
    #remove weird whitespace
    return obj.gsub(/\A[[:space:]]+|[[:space:]]+\z/, '')
end

def element?(obj)
    #returns true if text is an xml element --e.g. "<html>"
    if ucode_strip(obj) =~ /\A<.*?>/
        true
    end
end

def element(obj)
    #returns html element name --e.g. "<html>" => html, "html" => nil
    stripped = ucode_strip(obj)
    parts = stripped.split(/>/)
    return parts[0].sub(/</, '')
end

def value?(obj)
    #does the line contain information inside of tags <tag>value</tag>
    parts = obj.split(/>/)
    unless !parts[1]
        true
    end
end

def value(obj)
    #returns the value of an xml element --e.g. "<li>item</li>" => "item"
    parts = obj.split(/\</)
    parts[0]
end

def convert_file(file)

    text = File.read(file.to_s)

    lines = text.split(/\n/)

    last_tag = nil
    same_tags = nil
    multiple_values = []

    json = "{ "

    lines.each do |line|
        clean = ucode_strip(line)
        if line =~ /<.*?>/
            unless clean =~ /\A<\// #<opening tag>
                line_elements = clean.split(/>/)

                tag = "\"" + element(line) + "\"" + ':'

                if line_elements[1]
                    #there's more data in this line, not just a tag
                    unless same_tags == true
                        same_tags = true
                        json += tag + " ["
                        last_tag = element(line)
                    else
                        json += ", "
                    end

                    json += "\"" + value(line_elements[1]) + "\""
                else
                    #this line only contains an opening tag
                    same_tags = false #stop building list
                    unless element(line) == last_tag #the previous line started with the same tag
                        json += tag += " { "
                    else
                        json += ", { "
                    end
                    last_tag = tag
                end
            else #</closing tag>
                if same_tags == true
                    #we have a closing tag while building a list
                    same_tags = false #stop building list
                    json += "] } " #first close list, then bracket
                else
                    if clean =~ /#{last_tag}/
                        json += " } ] " #there's an open list we have to 
                    else
                        json += " } " #no open list, just close bracket
                    end
                end
            end
        end
    end

return json

end

input =  ARGV.first
puts convert_file(input)

As I said, this works, but I know it could be better. 如我所说,这可行,但我知道可能会更好。 I realize that there's little-to-no edge case handling, but I'm much more concerned with the way I'm handling the data as a whole. 我意识到几乎没有边缘案例处理,但是我更关心整个数据的处理方式。 Someone suggested using a ruby list as a stack to store the nested JSON, but I haven't quite figured that out yet. 有人建议使用ruby列表作为堆栈来存储嵌套的JSON,但我还没有弄清楚。 Any help would be much appreciated - if you've gotten this far, thanks for reading. 任何帮助将不胜感激-如果您已经走了这么远,感谢您的阅读。

The convert_file method is too complex, it mix the logic of parser and builder together which make it difficult to read, reuse and extend, what i can suggest is to separate this logic out. convert_file方法太复杂了,它将解析器和生成器的逻辑混合在一起,这使得读取,重用和扩展变得困难,我建议将这一逻辑分离出来。

The code used to parse the xml file is call a parser, there are three kind of parsers as far as i know , tha SAX, DOM, StAX you can find some reference through google and see how this can be implemented. 用于解析xml文件的代码称为解析器,据我所知,共有三种解析器,例如SAX,DOM,StAX,您可以通过google找到一些参考资料,看看如何实现。

The code used to generate json code is call a builder, it is a design pattern, you can find more info through google also. 用于生成json代码的代码称为生成器,它是一种设计模式,您也可以通过Google找到更多信息。

The stack is a data structure which you may used when implementing the parser and builder. 堆栈是一个数据结构,您可以在实现解析器和构建器时使用它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM