使用Golang进行HTML验证

Question

Within my API I have a POST end point. 在我的API中，我有一个POST终点。 One of the expected parameters being posted to that end point is a block of (loosely) valid HTML. 发布到该终点的预期参数之一是（松散）有效HTML块。

The POST will be in the format of JSON. POST将采用JSON格式。

Within golang how can I ensure that the HTML which is posted is valid? 在golang中，我如何确保发布的HTML有效？ I have been looking for something for a few days now and still haven't managed to find anything? 我一直在寻找一些东西现在仍然没有找到任何东西？

The term "valid" is kind of loose. “有效”一词有点松散。 I trying to ensure that tags are opened and closed, speech marks are in the right places etc. 我试图确保标签打开和关闭，语音标记在正确的位置等。

Answer 1

You check that the HTML blob provided parses correctly using html.Parse from this package . 您使用此程序包中的 html.Parse检查HTML blob是否正确解析。 For validation only, all you have to do is check for errors. 仅用于验证，您所要做的就是检查错误。

Answer 2

A bit late to the game, but here are a couple of parsers in Go that will work if you just want to validate the structure of the HTML (eg. you don't care if a div is inside a span, which is not allowed but is a schema level problem): 游戏有点晚了，但是如果你只是想验证HTML的结构，那么Go中的几个解析器会起作用（例如，你不关心div是否在一个跨度内，这是不允许的）但是模式级问题）：

x/net/html X /净/ HTML

The golang.org/x/net/html package contains a very loose parser. golang.org/x/net/html包中包含一个非常松散的解析器。 Almost anything will result in valid HTML, similar to what a lot of web browsers try to do (eg. it will ignore problems with unescaped values in many cases). 几乎所有东西都会产生有效的HTML，类似于许多Web浏览器尝试做的事情（例如，它会在很多情况下忽略未转义值的问题）。 For example, something like <span>></span> will likely validate (I didn't check this particular one, I just made it up) as a span with the '>' character in it. 例如，像<span>></span>这样的东西可能会验证（我没有检查这个特定的，我只是把它作为一个带有'>'字符的跨度。

It can be used something like this: 它可以用这样的东西：

r := strings.NewReader(`<span>></span>`)
z := html.NewTokenizer(r)
for {
    tt := z.Next()
    if tt == html.ErrorToken {
        err := z.Err()
        if err == io.EOF {
            // Not an error, we're done and it's valid!
            return nil
        }
        return err
    }
}

encoding/xml 编码/ XML

If you need something a tiny bit more strict, but which is still okay for HTML you can configure an xml.Decoder to work with HTML (this is what I do, it lets me be a bit more flexible about how strict I want to be in any given situation): 如果你需要一些更严格的东西，但仍然可以用于HTML，你可以配置一个xml.Decoder来处理HTML（这就是我做的，它让我对我想要的严格程度更加灵活一点在任何给定的情况下）：

r := strings.NewReader(`<html></html>`)
d := xml.NewDecoder(r)

// Configure the decoder for HTML; leave off strict and autoclose for XHTML
d.Strict = false
d.AutoClose = xml.HTMLAutoClose
d.Entity = xml.HTMLEntity
for {
    tt, err := d.Token()
    switch err {
    case io.EOF:
        return nil // We're done, it's valid!
    case nil:
    default:
        return err // Oops, something wasn't right
    }
}

使用Golang进行HTML验证

问题描述

2 个解决方案

解决方案1
2 2015-08-03 15:13:18

解决方案2
1 2018-09-19 16:36:37

x/net/html X /净/ HTML

encoding/xml 编码/ XML

使用Golang进行HTML验证

问题描述

2 个解决方案

解决方案1 2 2015-08-03 15:13:18

解决方案2 1 2018-09-19 16:36:37

x/net/html X /净/ HTML

encoding/xml 编码/ XML

解决方案1
2 2015-08-03 15:13:18

解决方案2
1 2018-09-19 16:36:37