正则表达式以html（golang）查找图像

Question

I'm parsing an xml rss feed from a couple of different sources and I want to find the images in the html. 我正在从几个不同的来源解析xml rss提要，我想在html中找到图像。

I did some research and I found a regex that I think might work 我做了一些研究，发现了我认为可能有用的正则表达式

/<img[^>]+src="?([^"\s]+)"?\s*\/>/g

but I have trouble using it in go. 但我无法在旅途中使用它。 It gives me errors because I don't know how to make it search with that expression. 它给了我错误，因为我不知道如何使用该表达式进行搜索。

I tried using it as a string, it doesn't escape properly with single or with double quotes. 我尝试将其用作字符串，单引号或双引号无法正确转义。 I tried using it just like that, bare, and it gives me an error. 我只是这样尝试使用它，但它给了我一个错误。

Any ideas? 有任何想法吗？

Answer 1

Using a proper html parser is always better for parsing html, however a cheap / hackish regex can also work fine, here's an example: 使用适当的html解析器始终比解析html更好，但是便宜的/ hackish正则表达式也可以正常工作，下面是一个示例：

var imgRE = regexp.MustCompile(`<img[^>]+\bsrc=["']([^"']+)["']`)
// if your img's are properly formed with doublequotes then use this, it's more efficient.
// var imgRE = regexp.MustCompile(`<img[^>]+\bsrc="([^"]+)"`)
func findImages(htm string) []string {
    imgs := imgRE.FindAllStringSubmatch(htm, -1)
    out := make([]string, len(imgs))
    for i := range out {
        out[i] = imgs[i][1]
    }
    return out
}

playground 操场

Answer 2

Ah so, sorry,Not worked with Go before but this seems work. 嗯，对不起，以前没有使用过Go，但这似乎可行。 tryed at 尝试过

https://tour.golang.org/welcome/1

. 。

package main

import (
     "fmt"
     "regexp"
)

func main() {
   var myString = `<img src='img1single.jpg'><img src="img2double.jpg">`
   var myRegex = regexp.MustCompile(`<img[^>]+\bsrc=["']([^"']+)["']`)
   var imgTags = myRegex.FindAllStringSubmatch(myString, -1)
   out := make([]string, len(imgTags))
  for i := range out {
    fmt.Println(imgTags[i][1])
   }
 }

I suggest to use htmlagility to parse any dom/xml kind a. 我建议使用htmlagility来解析任何dom / xml类型。

Read document by; 阅读文档依据；

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(sourceHtml);

Parse by Xpath definition RegX fine but group ext. 通过Xpath定义RegX进行解析，但可以进行ext分组。 issues makes job complex 问题使工作变得复杂

doc.DocumentNode.SelectSingleNode(XPath here)

or 要么

doc.DocumentNode.SelectNodes("//img")  // this should give all img tags

like. 喜欢。

i suggest this becouse it seems rss serves some html content ;) So get xml, parse with XMLDoc get html content that you need then get all images by this. 我建议这样做，因为看来rss提供了一些html内容；）因此，获取xml，使用XMLDoc进行解析，获取所需的html内容，然后由此获取所有图像。 For open answer. 公开答案。

after comment just need regex i think ; 我想评论后只需要正则表达式； my pattern is 我的模式是

 <img.+?src=[\"'](.+?)[\"'].*?>

for input 用于输入

<img src='img1single.jpg'>
<img src="img2double.jpg">

and result seems fine in .net you must get by foreach via .net中的结果似乎很好，您必须通过以下方式获取foreach

.Groups[1].Value

regards. 问候。

正则表达式以html（golang）查找图像

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-05-01 12:31:01

解决方案2
-2 2016-05-01 11:30:24

正则表达式以html（golang）查找图像

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-05-01 12:31:01

解决方案2 -2 2016-05-01 11:30:24

解决方案1
2 已采纳 2016-05-01 12:31:01

解决方案2
-2 2016-05-01 11:30:24