简体   繁体   中英

Extract all images from html string using Regex

I'm trying to use Regex to extract all image sources from html string. For couple reasons I cannot use HTML Agitility Pack.

I need to extract 'gfx/image.png' from strings which looks like

<table cellpadding="0" cellspacing="0"  border="0" style="height:350px; margin:0; background: url('gfx/image.jpg') no-repeat;">
<table cellpadding="0" cellspacing="0" border="0" background="gfx/image.jpg" style=" width:700px; height:250px; "><tr><td valign="middle">

you can use this regex: (['"])([^'"]+\.jpg)\1 then get Groups[2], this code is worked fine:

var str = @"<table cellpadding=""0"" cellspacing=""0""  border=""0"" style=""height:350px; margin:0; background: url('gfx/image.jpg') no-repeat;"">
<table cellpadding=""0"" cellspacing=""0"" border=""0"" background=""gfx/image.jpg"" style="" width:700px; height:250px; ""><tr><td valign=""middle"">";
var regex = new Regex(@"(['""])([^'""]+\.jpg)\1");
var match = regex.Match(str);
while (match.Success)
{
    Console.WriteLine(match.Groups[2].Value);
    match = match.NextMatch();
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM