[英]c# Regex img src
I want to get the links to the images from the img src in the html. 我想从HTML中的img src获取图像的链接。 I have a string of the html that I read into a method which returns an arraylist of the image urls. 我有一个html字符串,我将其读入一个方法,该方法返回图像url的arraylist。
Into the method I pass the string of html and the url of the webpage. 在方法中,我传递了html字符串和网页的url。
I need some help with the regex to get the image name with the extension. 我需要有关正则表达式的帮助,以获取带有扩展名的图像名称。 If you can help with matching against the html string that would be a bonus. 如果您可以帮助与html字符串匹配,那将是一个加分。 I will accept the right answer or close to it, thank you all. 我会接受正确的答案或接近正确的答案,谢谢大家。
I heard about HTML parsers but I would rather use this way thank you. 我听说过HTML解析器,但是我想用这种方式谢谢。
here is my method: 这是我的方法:
private ArrayList GetImageLinks(String inputHTML, String link)
{
ArrayList imageLinks = new ArrayList();
var regex = new Regex(@"<img.*?src=[\"'](.+?)[\"'].*?");
//using http://gskinner.com/RegExr/ this regex seems to get: <img src="beach.png" for example. while I need just beach.png.
//match the regex to the html and get all the image links like: image5.png
//link = inputHTML + link
//add new link to arraylist
return imageLinks;
}
I did not understand what you want to do with image source after extracting. 提取后,我不了解您要如何处理图像源。
Here is how you can extract image links. 这是提取图像链接的方法。
static IEnumerable<String> GetImageLinks(String inputHTML, String someLink)
{
const string pattern = @"<img\b[^\<\>]+?\bsrc\s*=\s*[""'](?<L>.+?)[""'][^\<\>]*?\>";
foreach (Match match in Regex.Matches(inputHTML, pattern, RegexOptions.IgnoreCase))
{
var imageLink = match.Groups["L"].Value;
/* Do something from your image link here*/
yield return imageLink;
}
}
You can use WebBrowser
to do that instead of string manipulation 您可以使用WebBrowser
来代替字符串操作
private string HtmlUpdateWithImage(string stringHtml)
{
System.Windows.Forms.WebBrowser browser = new System.Windows.Forms.WebBrowser();
browser.Navigate("about:blank");
HtmlDocument doc = browser.Document;
doc.Write(stringHtml);
if (null != browser.Document && null != browser.Document.Images && browser.Document.Images.Count > 0)
{
// Here you can get the image list browser.Document.Images
foreach (System.Windows.Forms.HtmlElement item in browser.Document.Images)
{
// To get file path for each image
string imageFilePath = item.GetAttribute("src");
// Or either you can set those values
item.SetAttribute("src","testPath");
}
}
return "<HTML>" + browser.Document.Body.OuterHtml + "</HTML>";
}
If you want just take name of image, just use method GetFileName() of class Path: 如果只想获取图像名称,则使用Path类的GetFileName()方法:
string internetAddress=@"http://hello.com/a/s/s/fff.jpg";
string takeName=Path.GetFileName(internetAddress);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.