[英]How to extract src attibute of a HTML tag?
我有一个HTML格式的字符串
<div class="ExternalClass6FC23FEAF7454B3A8006CF7E1D2257B8">
<audio src="/sites/audioblogs/Group2Doc/0.021950338035821915.wav" controls="controls"></audio><br/><img src="/sites/audioblogs/Group2Doc/20140103_152938.jpg" alt=""/></div>
我只需要source(src)属性,我正在尝试使用Regex.Match,
还有其他选择吗?
谢谢,Sachin
我会使用HtmlAgilityPack
解析HTML,而不是正则表达式:
var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html); // html is your string
var audio = doc.DocumentNode.Descendants("audio")
.FirstOrDefault(n => n.Attributes["src"] != null);
string src = null;
if (audio != null)
src = audio.Attributes["src"].Value;
结果: /sites/audioblogs/Group2Doc/0.021950338035821915.wav
string yourFullHtmlstring = ".....";
//will make sure all of your double quotes are single quotes
yourFullHtmlstring= yourFullHtmlstring.Replace("\"", "'");
//will turn it into array
string[] arr = yourFullHtmlstring.Split( new string[] {"src='"}, StringSplitOptions.None);
//this will trim the sources found only to the source value.
//start from 1 because we skip the first entry before the first src
for (int i = 1; i < arr.Length; i++)
{
arr[i] = arr[i].Substring(0, arr[i].IndexOf("'"));
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.