简体   繁体   中英

c# Regex img src

I want to get the links to the images from the img src in the html. I have a string of the html that I read into a method which returns an arraylist of the image urls.

Into the method I pass the string of html and the url of the webpage.

I need some help with the regex to get the image name with the extension. If you can help with matching against the html string that would be a bonus. I will accept the right answer or close to it, thank you all.

I heard about HTML parsers but I would rather use this way thank you.

here is my method:

   private ArrayList GetImageLinks(String inputHTML, String link)
    {
        ArrayList imageLinks = new ArrayList();  
        var regex = new Regex(@"<img.*?src=[\"'](.+?)[\"'].*?");

        //using http://gskinner.com/RegExr/ this regex seems to get: <img src="beach.png" for example. while I need just beach.png.

        //match the regex to the html and get all the image links like: image5.png
        //link = inputHTML + link
        //add new link to arraylist



        return imageLinks;
    }

I did not understand what you want to do with image source after extracting.

Here is how you can extract image links.

static IEnumerable<String> GetImageLinks(String inputHTML, String someLink)
{
    const string pattern = @"<img\b[^\<\>]+?\bsrc\s*=\s*[""'](?<L>.+?)[""'][^\<\>]*?\>";

    foreach (Match match in Regex.Matches(inputHTML, pattern, RegexOptions.IgnoreCase))
    {
        var imageLink = match.Groups["L"].Value;

        /* Do something from your image link here*/

        yield return imageLink;
    }
}

You can use WebBrowser to do that instead of string manipulation

       private string HtmlUpdateWithImage(string stringHtml)
        {
            System.Windows.Forms.WebBrowser browser = new System.Windows.Forms.WebBrowser();
            browser.Navigate("about:blank");
            HtmlDocument doc = browser.Document;
            doc.Write(stringHtml);

            if (null != browser.Document && null != browser.Document.Images && browser.Document.Images.Count > 0)
            {
                // Here you can get the image list browser.Document.Images
                foreach (System.Windows.Forms.HtmlElement item in browser.Document.Images)
                {
                    // To get file path for each image
                    string imageFilePath = item.GetAttribute("src");
                    // Or either you can set those values

                    item.SetAttribute("src","testPath");
                }
            }
            return "<HTML>" + browser.Document.Body.OuterHtml + "</HTML>";
        }

If you want just take name of image, just use method GetFileName() of class Path:

string internetAddress=@"http://hello.com/a/s/s/fff.jpg";
string takeName=Path.GetFileName(internetAddress);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM