GoLang-具有HTML的XmlPath选择器

Question

I am looking at the documented example here , but it is iterating purely over an XML tree, and not HTML. 我正在看这里记录在案的示例，但是它纯粹是在XML树而不是HTML上进行迭代。 Therefore, I am still partly confused. 因此，我仍然有些困惑。

For example, if I wanted to find a specific meta tag within the head tag by name, it seems I cannot? 例如，如果我想按名称在head标签中找到特定的meta标签，看来我做不到？ Instead, I need to find it by the order it is in the head tag. 相反，我需要按head标签中的顺序查找它。 In this case, I want the 8th meta tag, which I assume is: 在这种情况下，我需要第8个meta标签，我假设是：

headTag, err := getByID(xmlroot, "/head/meta[8]/") headTag，err：= getByID（xmlroot，“ / head / meta [8] /”）

But of course, this is using a getByID function for a tag name - which I don't believe will work. 但是，当然，这是将getByID函数用于标记名称-我认为这不起作用。 What is the full list of "getBy..." commands? “ getBy ...”命令的完整列表是什么？

Then, the problem is, how do I access the meta tag's contents? 然后，问题是，如何访问元标记的内容？ The documentation only provides examples for the inner tag node content. 该文档仅提供内部标记节点内容的示例。 However, will this example work?: 但是，此示例是否有效？：

resp.Query = extractValue(headTag, @content ) resp.Query = extractValue（headTag， @content ）

The @ selector confuses me, is this appropriate for this case? @选择器使我感到困惑，这是否适合这种情况？

In other words: 换一种说法：

Is there a proper HTML example available? 是否有合适的HTML示例可用？
Is there a list of correct selectors for IDs, Tags, etc? 是否有ID，标签等正确选择器的列表？
Can Tags be found by name, and content extracted from its inner content tag? 可以按名称找到标签，并且从其内部内容标签中提取内容吗？

Thank you very much! 非常感谢你！

Answer 1

XPath does not seem suitable here; XPath在这里似乎不合适； you should be using goquery , which is designed for HTML. 您应该使用专为HTML设计的goquery 。

Here is an example: 这是一个例子：

package main

import (
    "fmt"

    "github.com/PuerkitoBio/goquery"
)

func main() {
    doc, err := goquery.NewDocument("https://example.com")
    if err != nil {
        panic(err)
    }
    s := doc.Find(`html > head > meta[name="viewport"]`)
    if s.Length() == 0 {
        fmt.Println("could not find viewpoint")
        return
    }
    fmt.Println(s.Eq(0).AttrOr("content", ""))
}

Answer 2

I know this answer is late, but I still want to recommend an htmlquery package that is simple and powerful, based on XPath expressions*. 我知道这个答案来晚了，但是我仍然想推荐一个基于XPath表达式*的简单而强大的htmlquery包。

The below code based on @Time-Cooper example. 下面的代码基于@ Time-Cooper示例。

package main

import (
    "fmt"

    "github.com/antchfx/htmlquery"
)

func main() {
    doc, err := htmlquery.LoadURL("https://example.com")
    if err != nil {
        panic(err)
    }
    s := htmlquery.Find(doc, "//meta[@name='viewport']")
    if len(s) == 0 {
        fmt.Println("could not find viewpoint")
        return
    }
    fmt.Println(htmlquery.SelectAttr(s[0], "content"))

    // alternative method,but simple more.
    s2 := htmlquery.FindOne(doc, "//meta[@name='viewport']/@content")
    fmt.Println(htmlquery.InnerText(s2))
}

GoLang-具有HTML的XmlPath选择器

问题描述

2 个解决方案

解决方案1
4 已采纳 2017-02-08 19:59:56

解决方案2
1 2018-12-09 02:48:00

GoLang-具有HTML的XmlPath选择器

问题描述

2 个解决方案

解决方案1 4 已采纳 2017-02-08 19:59:56

解决方案2 1 2018-12-09 02:48:00

解决方案1
4 已采纳 2017-02-08 19:59:56

解决方案2
1 2018-12-09 02:48:00