[英]XML parsing returns string with newlines
我試圖通過站點地圖解析XML,然后遍歷該地址以獲取Go中帖子的詳細信息。 但是我收到這個奇怪的錯誤:
:URL中的第一個路徑段不能包含冒號
這是代碼片段:
type SitemapIndex struct {
Locations []Location `xml:"sitemap"`
}
type Location struct {
Loc string `xml:"loc"`
}
func (l Location) String() string {
return fmt.Sprintf(l.Loc)
}
func main() {
resp, _ := http.Get("https://www.washingtonpost.com/news-sitemaps/index.xml")
bytes, _ := ioutil.ReadAll(resp.Body)
var s SitemapIndex
xml.Unmarshal(bytes, &s)
for _, Location := range s.Locations {
fmt.Printf("Location: %s", Location.Loc)
resp, err := http.Get(Location.Loc)
fmt.Println("resp", resp)
fmt.Println("err", err)
}
}
並輸出:
Location:
https://www.washingtonpost.com/news-sitemaps/politics.xml
resp <nil>
err parse
https://www.washingtonpost.com/news-sitemaps/politics.xml
: first path segment in URL cannot contain colon
Location:
https://www.washingtonpost.com/news-sitemaps/opinions.xml
resp <nil>
err parse
https://www.washingtonpost.com/news-sitemaps/opinions.xml
: first path segment in URL cannot contain colon
...
...
我的猜測是Location.Loc
在實際地址之前和之后返回新行。 例如: \\nLocation: https://www.washingtonpost.com/news-sitemaps/politics.xml\\n
: \\nLocation: https://www.washingtonpost.com/news-sitemaps/politics.xml\\n
由於對URL進行硬編碼可以按預期工作:
for _, Location := range s.Locations {
fmt.Printf("Location: %s", Location.Loc)
test := "https://www.washingtonpost.com/news-sitemaps/politics.xml"
resp, err := http.Get(test)
fmt.Println("resp", resp)
fmt.Println("err", err)
}
如您所見,輸出為nil:
Location:
https://www.washingtonpost.com/news-sitemaps/politics.xml
resp &{200 OK 200 HTTP/2.0 2 0 map[Server:[nginx] Arc-Service:[api] Arc-Org-Name:[washpost] Expires:[Sat, 02 Feb 2019 05:32:38 GMT] Content-Security-Policy:[upgrade-insecure-requests] Arc-Deployment:[washpost] Arc-Organization:[washpost] Cache-Control:[private, max-age=60] Arc-Context:[index] Arc-Application:[Feeds] Vary:[Accept-Encoding] Content-Type:[text/xml; charset=utf-8] Arc-Servername:[api.washpost.arcpublishing.com] Arc-Environment:[index] Arc-Org-Env:[washpost] Arc-Route:[/feeds] Date:[Sat, 02 Feb 2019 05:31:38 GMT]] 0xc000112870 -1 [] false true map[] 0xc00017c200 0xc0000ca370}
err <nil>
Location:
...
...
但是我對Go還是很陌生,所以我不知道出了什么問題。 你能告訴我我哪里錯了嗎?
確實您是對的,問題出在換行符上。 如您所見,您正在使用Printf
而不添加任何\\n
並且在輸出的開始處添加一個,在結尾處添加一個。
您可以使用strings.Trim
刪除這些換行符。 這是使用您要解析的站點地圖的示例 。 字符串被修剪后,您將可以調用http.Get
,而不會發生任何錯誤。
func main() {
var s SitemapIndex
xml.Unmarshal(bytes, &s)
for _, Location := range s.Locations {
loc := strings.Trim(Location.Loc, "\n")
fmt.Printf("Location: %s\n", loc)
}
}
此代碼正確輸出了沒有換行符的位置,如預期的那樣:
Location: https://www.washingtonpost.com/news-sitemaps/politics.xml
Location: https://www.washingtonpost.com/news-sitemaps/opinions.xml
Location: https://www.washingtonpost.com/news-sitemaps/local.xml
Location: https://www.washingtonpost.com/news-sitemaps/sports.xml
Location: https://www.washingtonpost.com/news-sitemaps/national.xml
Location: https://www.washingtonpost.com/news-sitemaps/world.xml
Location: https://www.washingtonpost.com/news-sitemaps/business.xml
Location: https://www.washingtonpost.com/news-sitemaps/technology.xml
Location: https://www.washingtonpost.com/news-sitemaps/lifestyle.xml
Location: https://www.washingtonpost.com/news-sitemaps/entertainment.xml
Location: https://www.washingtonpost.com/news-sitemaps/goingoutguide.xml
之所以在Location.Loc
字段中包含這些換行符,是因為此URL返回了XML。 條目采用以下形式:
<sitemap>
<loc>
https://www.washingtonpost.com/news-sitemaps/goingoutguide.xml
</loc>
</sitemap>
如您所見, loc
元素中的內容前后都有換行符。
請參閱修改后的代碼中嵌入的注釋,以描述和解決此問題
func main() {
resp, _ := http.Get("https://www.washingtonpost.com/news-sitemaps/index.xml")
bytes, _ := ioutil.ReadAll(resp.Body)
var s SitemapIndex
xml.Unmarshal(bytes, &s)
for _, Location := range s.Locations {
// Note that %v shows that there are indeed newlines at beginning and end of Location.Loc
fmt.Printf("Location: (%v)", Location.Loc)
// solution: use strings.TrimSpace to remove newlines from Location.Loc
resp, err := http.Get(strings.TrimSpace(Location.Loc))
fmt.Println("resp", resp)
fmt.Println("err", err)
}
}
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.