繁体   English   中英

如何提取HTML标签的src属性?

[英]How to extract src attibute of a HTML tag?

我有一个HTML格式的字符串

<div class="ExternalClass6FC23FEAF7454B3A8006CF7E1D2257B8">
<audio src="/sites/audioblogs/Group2Doc/0.021950338035821915.wav"   controls="controls"></audio><br/><img   src="/sites/audioblogs/Group2Doc/20140103_152938.jpg" alt=""/></div>

我只需要source(src)属性,我正在尝试使用Regex.Match,

还有其他选择吗?

谢谢,Sachin

我会使用HtmlAgilityPack解析HTML,而不是正则表达式:

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);  // html is your string
var audio = doc.DocumentNode.Descendants("audio")
    .FirstOrDefault(n => n.Attributes["src"] != null);
string src = null;
if (audio != null)
    src = audio.Attributes["src"].Value;  

结果: /sites/audioblogs/Group2Doc/0.021950338035821915.wav

string yourFullHtmlstring = ".....";
//will make sure all of your double quotes are single quotes
yourFullHtmlstring= yourFullHtmlstring.Replace("\"", "'");

//will turn it into array
string[] arr = yourFullHtmlstring.Split( new string[] {"src='"}, StringSplitOptions.None);

//this will trim the sources found only to the source value.
//start from 1 because we skip the first entry before the first src
for (int i = 1; i < arr.Length; i++)
{
    arr[i] = arr[i].Substring(0, arr[i].IndexOf("'"));
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM