简体   繁体   English

GoLang-具有HTML的XmlPath选择器

[英]GoLang - XmlPath Selectors with HTML

I am looking at the documented example here , but it is iterating purely over an XML tree, and not HTML. 我正在看这里记录在案的示例,但是它纯粹是在XML树而不是HTML上进行迭代。 Therefore, I am still partly confused. 因此,我仍然有些困惑。

For example, if I wanted to find a specific meta tag within the head tag by name, it seems I cannot? 例如,如果我想按名称在head标签中找到特定的meta标签,看来我做不到? Instead, I need to find it by the order it is in the head tag. 相反,我需要按head标签中的顺序查找它。 In this case, I want the 8th meta tag, which I assume is: 在这种情况下,我需要第8个meta标签,我假设是:

headTag, err := getByID(xmlroot, "/head/meta[8]/") headTag,err:= getByID(xmlroot,“ / head / meta [8] /”)

But of course, this is using a getByID function for a tag name - which I don't believe will work. 但是,当然,这是将getByID函数用于标记名称-我认为这不起作用。 What is the full list of "getBy..." commands? “ getBy ...”命令的完整列表是什么?

Then, the problem is, how do I access the meta tag's contents? 然后,问题是,如何访问元标记的内容? The documentation only provides examples for the inner tag node content. 该文档仅提供内部标记节点内容的示例。 However, will this example work?: 但是,此示例是否有效?:

resp.Query = extractValue(headTag, @content ) resp.Query = extractValue(headTag, @content

The @ selector confuses me, is this appropriate for this case? @选择器使我感到困惑,这是否适合这种情况?

In other words: 换一种说法:

  1. Is there a proper HTML example available? 是否有合适的HTML示例可用?
  2. Is there a list of correct selectors for IDs, Tags, etc? 是否有ID,标签等正确选择器的列表?
  3. Can Tags be found by name, and content extracted from its inner content tag? 可以按名称找到标签,并且从其内部内容标签中提取内容吗?

Thank you very much! 非常感谢你!

XPath does not seem suitable here; XPath在这里似乎不合适; you should be using goquery , which is designed for HTML. 您应该使用专为HTML设计的goquery

Here is an example: 这是一个例子:

package main

import (
    "fmt"

    "github.com/PuerkitoBio/goquery"
)

func main() {
    doc, err := goquery.NewDocument("https://example.com")
    if err != nil {
        panic(err)
    }
    s := doc.Find(`html > head > meta[name="viewport"]`)
    if s.Length() == 0 {
        fmt.Println("could not find viewpoint")
        return
    }
    fmt.Println(s.Eq(0).AttrOr("content", ""))
}

I know this answer is late, but I still want to recommend an htmlquery package that is simple and powerful, based on XPath expressions*. 我知道这个答案来晚了,但是我仍然想推荐一个基于XPath表达式*的简单而强大的htmlquery包。

The below code based on @Time-Cooper example. 下面的代码基于@ Time-Cooper示例。

package main

import (
    "fmt"

    "github.com/antchfx/htmlquery"
)

func main() {
    doc, err := htmlquery.LoadURL("https://example.com")
    if err != nil {
        panic(err)
    }
    s := htmlquery.Find(doc, "//meta[@name='viewport']")
    if len(s) == 0 {
        fmt.Println("could not find viewpoint")
        return
    }
    fmt.Println(htmlquery.SelectAttr(s[0], "content"))

    // alternative method,but simple more.
    s2 := htmlquery.FindOne(doc, "//meta[@name='viewport']/@content")
    fmt.Println(htmlquery.InnerText(s2))
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM