簡體   English   中英

去解析無效的XML

[英]Go parse not valid XML

有指向XML的鏈接: http : //www.guru.com/rss/jobs/嘗試解析帶有encoding/xml ,出現錯誤:

第1行的XML語法錯誤:無效的XML名稱:t

我知道,這個XML損壞了,但是如何忽略它並解析頭幾項呢?

XML的最后一項如下所示:

<item>
    <title>Online Ad Posting Data Entry Jobs</t
    <?xml version="1.0" encoding="utf-8"?>
    <rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
        <channel>
            <title>Guru Jobs</title>
            <link>http://www.guru.com</link>
            <description>Guru Jobs</description>
            <lastBuildDate>Sun, 15 Nov 2015 11:04:51 GMT</lastBuildDate>
            <language>en-us</language>
            <atom:link href='http://www.guru.com/rss/jobs/' rel="self" type="application/rss+xml" />
        </channel>
    </rss>
    itle>
    <link>http://www.guru.com/jobs/online-ad-posting-data-entry-jobs/1189496</link>
    <guid>http://www.guru.com/jobs/online-ad-posting-data-entry-jobs/1189496</guid>
</item> 

代碼示例:

type Rss2 struct { 
    ItemList []Item `xml:"channel>item"`
}
type Item struct {
    Title       string      `xml:"title"`
    Link        string      `xml:"link"`
    Description string      `xml:"description"`
    PubDate     string      `xml:"pubDate"`
    GUID        string      `xml:"guid"`    
}

r := Rss2{}
reader := bytes.NewReader(xmlRead)
decoder := xml.NewDecoder(reader)
decoder.CharsetReader = charset.NewReaderLabel
decoder.Strict = false
err = decoder.Decode(&r)
if err != nil { fmt.Printf(err.Error()) }

XML標簽應正確打開和關閉。 從您發布的XML看來,XML聲明並不是開始。

<?xml version="1.0" encoding="utf-8"?>

這應該是開始。 希望這可以幫助

有問題的XML似乎是錯誤的,

這是XML文件和Go代碼的正確版本

XML檔案:

<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
    <title>Guru Jobs</title>
    <link>http://www.guru.com</link>
    <description>Guru Jobs</description>
    <lastBuildDate>Sun, 15 Nov 2015 11:04:51 GMT</lastBuildDate>
    <language>en-us</language>
    <atom:link href='http://www.guru.com/rss/jobs/' rel="self" type="application/rss+xml" />
    <item>
        <title>Imaging for Bespoke Curtain Website</title>
        <link>http://www.guru.com/jobs/imaging-for-bespoke-curtain-website/1203083</link>
        <guid>http://www.guru.com/jobs/imaging-for-bespoke-curtain-website/1203083</guid>
        <description><![CDATA[<b>Description:</b> Hi,We are currently developing a made to measure curtain website and are looking for help in develo...<br><b>Category:</b> Web, Software & IT<br><b>Required skills:</b> ecommerce, imaging software, opencart, web development<br><b>Fixed Price budget:</b> $500-$1k<br><b>Job type:</b> Public<br><b>Freelancer Location:</b> Worldwide<br>]]>
        </description>
        <pubDate>Mon, 04 Jan 2016 12:14:09 GMT</pubDate>
    </item>
</channel>
</rss>

示例代碼

package main

import (
    "io/ioutil"
    "encoding/xml"
    "fmt"
    )

type Rss2 struct {
    ItemList []Item `xml:"channel>item"`
}
type Item struct {
    Title       string      `xml:"title"`
    Link        string      `xml:"link"`
    Description string      `xml:"description"`
    PubDate     string      `xml:"pubDate"`
    GUID        string      `xml:"guid"`
}

func main() {
    r := Rss2{}
    xmlContent, _ := ioutil.ReadFile("example2.xml")
    if err := xml.Unmarshal(xmlContent, &r); err != nil {
        panic(err)
    }
    fmt.Println("RSS item :", r)
}

現在,您可以迭代並找到XML所需的數據。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM