簡體   English   中英

遞歸xml解析功能未按預期工作

[英]recursive xml-parsing function not working as intended

我正在嘗試解析XML文檔並使用數據來構建這種形式的(簡單)json對象:

{id: '1', name: 'content-types', children: [{id: '2', name: 'requirements': children: [... and so on ...]]}

我的XML具有如下所示的節點(我僅包括其中一個-它們可以任意嵌套):

<?xml version="1.0" encoding="utf-8"?>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head/>
  <body class="taxonomies">
    <div class="taxonomy">
      <span class="id">3484069771</span>
      <span class="name">Content Types</span>
      <span class="locale">en</span>
      <div class="concepts">
        <div class="concept">
          <span class="id">3484058507</span>
          <span class="name">Promotional Publications</span>
          <div class="concepts">
            <div class="concept">
              <span class="id">3551765771</span>
              <span class="name">Datasheets</span>
            </div>
          </div>
        </div>
      </div>
    </div>
  </body>
</html>

我使用以下代碼從XML構建JSON樹:

buildConceptTree: function(xml){
    const doc = new dom().parseFromString(xml)
    var tree = []
    var selector = "//*[@class='taxonomies']"
    var count = 0  // this should keep track of the depth of the node being used
    function recurse(s, odd){
        var nodes
        console.log(count)
        console.log(s)
        var arr = []

        nodes = xpath.select(s, doc)
        nodes.forEach(node => {
            try {
                var children = node.childNodes
                var keys = Object.keys(children).filter(x => {return Number(x)})
                keys.forEach(key => {
                    var child = children[key]
                    console.log('child is: ')
                    console.log(child)
                    var obj = {}
                    var grandchildren = child.childNodes
                    var grandkeys = Object.keys(grandchildren).filter(x => {return Number(x)})

                    grandkeys.forEach(gk => {
                        var gc = grandchildren[gk]
                        try {
                            var nodevalue = gc['attributes'][0]['nodeValue']
                            switch(nodevalue){
                            case 'id':
                                obj['id'] = gc['textContent']
                            case 'name':
                                obj['name'] = gc['textContent']
                            case 'concepts':
                                count++
                                var rx = /taxonomy/
                                    if(!rx.test(s)){
                                        s = s+"/*[@class='taxonomy']"
                                    }
                                else{
                                    s = s
                                }
                                if (!odd){
                                    s += "/*[@class='concepts']"
                                }
                                else {
                                    s += "/*[@class='concept']"
                                }
                                odd = !odd
                                obj['children'] = recurse(s, odd)
                            }
                        }
                        catch(e){
                        }
                    })
                    arr.push(obj)
                })
            }
            catch(e){
            }

        })
        return arr


    }

    var tree = recurse(selector, false)
    return tree

},

就目前而言,此函數會產生類似於我提到的JSON形式的內容,但是缺少許多節點。

另外,似乎我的遞歸函數並沒有在最簡單的情況下終止,因為它沿着xml樹的更深的分支遞歸。 我在控制台中記錄了以下內容(例如),但是沒有深度為191度的節點:

    191
     parser.js?d3c4:83 //*[@class='taxonomies']/*[@class='taxonomy']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']/*[@class='concept']/*[@class='concepts']
     parser.js?d3c4:92 child is:

誰能幫助我找出如何更改此功能以使其獲取所需的數據?

我可能已經錯過了一些要求,但是一旦停止循環瀏覽所有元素並開始查詢所需的確切元素,該問題似乎就不那么復雜了:

 // Parse the xml string to a document const parser = new DOMParser(); const xmlDoc = parser.parseFromString( getXML(), "text/xml" ); // The main logic to go from an xml element to an object const parseTaxonomy = (taxonomy, id = 1) => ({ id, name: taxonomy.querySelector(".name") .innerText .toLowerCase() .replace(/\\s/g, "-"), children: Array.from( (taxonomy.querySelector(".concepts") || { children: [] }) .children ).map(t => parseTaxonomy(t, ++id)) // Note the ++ }); // Run on the first taxonomy // If the top level contains multiple elements, use .map console.log( parseTaxonomy( xmlDoc.querySelector(".taxonomy") ) ); // The data function getXML() { return `<?xml version="1.0" encoding="utf-8"?> <html xmlns="http://www.w3.org/1999/xhtml"> <head/> <body class="taxonomies"> <div class="taxonomy"> <span class="id">3484069771</span> <span class="name">Content Types</span> <span class="locale">en</span> <div class="concepts"> <div class="concept"> <span class="id">3484058507</span> <span class="name">Promotional Publications</span> <div class="concepts"> <div class="concept"> <span class="id">3551765771</span> <span class="name">Datasheets</span> </div> </div> </div> </div> </div> </body> </html>`; }; 

注意:我更改了您放置評論的部分,因為該評論尚未關閉,並且我希望它在子分類法周圍有另一個包裝。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM