简体   繁体   English

使用Golang进行HTML验证

[英]HTML Validation with Golang

Within my API I have a POST end point. 在我的API中,我有一个POST终点。 One of the expected parameters being posted to that end point is a block of (loosely) valid HTML. 发布到该终点的预期参数之一是(松散)有效HTML块。

The POST will be in the format of JSON. POST将采用JSON格式。

Within golang how can I ensure that the HTML which is posted is valid? 在golang中,我如何确保发布的HTML有效? I have been looking for something for a few days now and still haven't managed to find anything? 我一直在寻找一些东西现在仍然没有找到任何东西?

The term "valid" is kind of loose. “有效”一词有点松散。 I trying to ensure that tags are opened and closed, speech marks are in the right places etc. 我试图确保标签打开和关闭,语音标记在正确的位置等。

You check that the HTML blob provided parses correctly using html.Parse from this package . 您使用此程序包中的 html.Parse检查HTML blob是否正确解析。 For validation only, all you have to do is check for errors. 仅用于验证,您所要做的就是检查错误。

A bit late to the game, but here are a couple of parsers in Go that will work if you just want to validate the structure of the HTML (eg. you don't care if a div is inside a span, which is not allowed but is a schema level problem): 游戏有点晚了,但是如果你只是想验证HTML的结构,那么Go中的几个解析器会起作用(例如,你不关心div是否在一个跨度内,这是不允许的)但是模式级问题):

x/net/html X /净/ HTML

The golang.org/x/net/html package contains a very loose parser. golang.org/x/net/html包中包含一个非常松散的解析器。 Almost anything will result in valid HTML, similar to what a lot of web browsers try to do (eg. it will ignore problems with unescaped values in many cases). 几乎所有东西都会产生有效的HTML,类似于许多Web浏览器尝试做的事情(例如,它会在很多情况下忽略未转义值的问题)。 For example, something like <span>></span> will likely validate (I didn't check this particular one, I just made it up) as a span with the '>' character in it. 例如,像<span>></span>这样的东西可能会验证(我没有检查这个特定的,我只是把它作为一个带有'>'字符的跨度。

It can be used something like this: 它可以用这样的东西:

r := strings.NewReader(`<span>></span>`)
z := html.NewTokenizer(r)
for {
    tt := z.Next()
    if tt == html.ErrorToken {
        err := z.Err()
        if err == io.EOF {
            // Not an error, we're done and it's valid!
            return nil
        }
        return err
    }
}

encoding/xml 编码/ XML

If you need something a tiny bit more strict, but which is still okay for HTML you can configure an xml.Decoder to work with HTML (this is what I do, it lets me be a bit more flexible about how strict I want to be in any given situation): 如果你需要一些更严格的东西,但仍然可以用于HTML,你可以配置一个xml.Decoder来处理HTML(这就是我做的,它让我对我想要的严格程度更加灵活一点在任何给定的情况下):

r := strings.NewReader(`<html></html>`)
d := xml.NewDecoder(r)

// Configure the decoder for HTML; leave off strict and autoclose for XHTML
d.Strict = false
d.AutoClose = xml.HTMLAutoClose
d.Entity = xml.HTMLEntity
for {
    tt, err := d.Token()
    switch err {
    case io.EOF:
        return nil // We're done, it's valid!
    case nil:
    default:
        return err // Oops, something wasn't right
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM