简体   繁体   English

如何从HTML元素字符串获取子字符串?

[英]How to get substring from html element string?

I use asp.net project in server side. 我在服务器端使用asp.net项目。

I have this string: <img src="../../SpatialData/sometext/813.jpg" style="width:190px"> 我有以下字符串: <img src="../../SpatialData/sometext/813.jpg" style="width:190px">

at some point I need to extruct src from string: 在某些时候,我需要从字符串中提取src:

../../SpatialData/sometext/813.jpg  

How can I get substring using c#? 如何使用C#获取子字符串?

You can use some regex to solve that... 您可以使用一些正则表达式来解决...

var test = "<img src=\"../../SpatialData/sometext/813.jpg\" style=\"width:190px\">";

var pattern = @"<img src=""([^\""]*)";

var result = Regex.Match(test, pattern).Groups[1].Value;

Console.WriteLine(result);

The issue is... if you're performing that function against any html document with multiple image tags, it's not going to work, to get them all... 问题是...如果您要对具有多个图像标签的任何html文档执行该功能,则无法正常工作,无法全部获取...

test = "<img src=\"../../SpatialData/sometext/813.jpg\" style=\"width:190px\"><img src=\"../../SpatialData/sometext/814.jpg\" style=\"width:190px\">";

var matches = Regex.Matches(test, pattern)
                   .Cast<Match>()
                   .Select(x=>x.Groups[1].Value);

foreach (var m in matches)
{
    Console.WriteLine(m);
}

As someone has already stated, HTML agility pack is an option worth looking into, the solution I've provided above is very rigid, if the attributes on the image tag were to be reordered differently, those elements would not be included in the results 正如有人已经说过的那样,HTML敏捷包是一个值得研究的选项,我上面提供的解决方案非常严格,如果要对image标签上的属性进行不同的重新排序,那么这些元素将不会包含在结果中

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM