C＃regex和html，结束于第一个“

Question

I want to get a url from a string. 我想从字符串中获取一个url。 Heres my code to extract an img url. 继承我的代码来提取img url。

        var imgReg = new Regex("img\\s*src\\s*=\\s*\"(.*)\"");
        string imgLink = imgReg.Match(page, l, r - l).Groups[1].Value;

The result was 结果是

http://url.com/file.png" border="0" alt="

How do i fix this so it ends at the first "? I tried something like 我如何解决这个问题，所以它在第一个结束时“？我尝试了类似的东西

        var imgReg = new Regex("img\\s*src\\s*=\\s*\"(.*[^\\\"])\"");

But i got the same results as the original. 但我得到了与原版相同的结果。

Answer 1

Try this: 尝试这个：

var imgReg = new Regex(@"img\s+src\s*=\s*""([^""']*)""");

Also, note the "\\s+" instead of "\\s*" after "img". 另外，在“img”之后注意“\\ s +”而不是“\\ s *”。 You need at least one space there. 你需要至少一个空间。

You can also use the non-greedy (or "lazy") version of the star operator, which, instead of matching as much as possible, would match a little as possible and stop, as you would like, at the first ending quote: 您还可以使用星形运算符的非贪婪（或“懒惰”）版本，它尽可能匹配尽可能匹配，并在第一个结束引号处停止，如您所愿：

var imgReg = new Regex(@"img\s+src\s*=\s*""(.*?)""");

(note the "?" after ".*") （注意“。*”之后的“？”）

Answer 2

Please consider using a DOM (such as the Html Agility Pack ) to parse HTML rather than using regular expressions. 请考虑使用DOM（例如Html Agility Pack ）来解析HTML而不是使用正则表达式。 A DOM should handle all edge cases; DOM应该处理所有边缘情况; regular expressions won't. 正则表达式不会。

Answer 3

Your .* is too greedy. 你的.*太贪心了。 Change it to the following and it will select everything up to the next double-quote. 将其更改为以下内容，它将选择下一个双引号的所有内容。

Source Text:  <img src="http://url.com/file.png" border="0" alt="" />
              <img src='http://url.com/file.png' border='0' alt='' />

RegEx:        <img\s*src\s*=\s*[\"\']([^\"\']+)[\"\']

I just changed the (.* ) to ([^"]+) . This means that you'll grab every non-double-quote character up to the next part of the regex. It also supports single- or double-quotes. 我只是将(.* ）改为([^"]+) 。这意味着你将抓住每个非双引号字符直到正则表达式的下一部分。它还支持单引号或双引号。

Answer 4

What it looks like to me is, your (*.) is catching the double quotes you don't want to match. 对我来说是什么样的，你的（*。）正在捕捉你不想匹配的双引号。

You can do """ to match a double quote, or do something like this for your link matching 您可以使用“”来匹配双引号，或者为您的链接匹配执行类似的操作

Match(input, @"http://(\\w./)+.png"); 匹配（输入，@“http：//（\\ w。/）+ .png”）;

C＃regex和html，结束于第一个“

问题描述

4 个解决方案

解决方案1
4 已采纳 2009-11-09 21:46:31

解决方案2
3 2009-11-09 21:51:51

解决方案3
1 2009-11-09 21:54:02

解决方案4
0 2009-11-09 21:49:17

C＃regex和html，结束于第一个“

问题描述

4 个解决方案

解决方案1 4 已采纳 2009-11-09 21:46:31

解决方案2 3 2009-11-09 21:51:51

解决方案3 1 2009-11-09 21:54:02

解决方案4 0 2009-11-09 21:49:17

解决方案1
4 已采纳 2009-11-09 21:46:31

解决方案2
3 2009-11-09 21:51:51

解决方案3
1 2009-11-09 21:54:02

解决方案4
0 2009-11-09 21:49:17