[英]GoLang - XmlPath Selectors with HTML
I am looking at the documented example here , but it is iterating purely over an XML tree, and not HTML. 我正在看这里记录在案的示例,但是它纯粹是在XML树而不是HTML上进行迭代。 Therefore, I am still partly confused.
因此,我仍然有些困惑。
For example, if I wanted to find a specific meta tag within the head tag by name, it seems I cannot? 例如,如果我想按名称在head标签中找到特定的meta标签,看来我做不到? Instead, I need to find it by the order it is in the head tag.
相反,我需要按head标签中的顺序查找它。 In this case, I want the 8th meta tag, which I assume is:
在这种情况下,我需要第8个meta标签,我假设是:
headTag, err := getByID(xmlroot, "/head/meta[8]/")
headTag,err:= getByID(xmlroot,“ / head / meta [8] /”)
But of course, this is using a getByID function for a tag name - which I don't believe will work. 但是,当然,这是将getByID函数用于标记名称-我认为这不起作用。 What is the full list of "getBy..." commands?
“ getBy ...”命令的完整列表是什么?
Then, the problem is, how do I access the meta tag's contents? 然后,问题是,如何访问元标记的内容? The documentation only provides examples for the inner tag node content.
该文档仅提供内部标记节点内容的示例。 However, will this example work?:
但是,此示例是否有效?:
resp.Query = extractValue(headTag,
@content
)resp.Query = extractValue(headTag,
@content
)
The @ selector confuses me, is this appropriate for this case? @选择器使我感到困惑,这是否适合这种情况?
In other words: 换一种说法:
Thank you very much! 非常感谢你!
XPath does not seem suitable here; XPath在这里似乎不合适; you should be using goquery , which is designed for HTML.
您应该使用专为HTML设计的goquery 。
Here is an example: 这是一个例子:
package main
import (
"fmt"
"github.com/PuerkitoBio/goquery"
)
func main() {
doc, err := goquery.NewDocument("https://example.com")
if err != nil {
panic(err)
}
s := doc.Find(`html > head > meta[name="viewport"]`)
if s.Length() == 0 {
fmt.Println("could not find viewpoint")
return
}
fmt.Println(s.Eq(0).AttrOr("content", ""))
}
I know this answer is late, but I still want to recommend an htmlquery package that is simple and powerful, based on XPath expressions*. 我知道这个答案来晚了,但是我仍然想推荐一个基于XPath表达式*的简单而强大的htmlquery包。
The below code based on @Time-Cooper example. 下面的代码基于@ Time-Cooper示例。
package main
import (
"fmt"
"github.com/antchfx/htmlquery"
)
func main() {
doc, err := htmlquery.LoadURL("https://example.com")
if err != nil {
panic(err)
}
s := htmlquery.Find(doc, "//meta[@name='viewport']")
if len(s) == 0 {
fmt.Println("could not find viewpoint")
return
}
fmt.Println(htmlquery.SelectAttr(s[0], "content"))
// alternative method,but simple more.
s2 := htmlquery.FindOne(doc, "//meta[@name='viewport']/@content")
fmt.Println(htmlquery.InnerText(s2))
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.