简体   繁体   中英

How to extract an url from a String in C#

I have this string :

 "<figure><img
 src='http://myphotos.net/image.ashx?type=2&image=Images\\2\\9\\11\\12\\3\\8\\4\\7\\685621455625.jpg'
 href='JavaScript:void(0);' onclick='return takeImg(this)'
 tabindex='1' class='myclass' width='55' height='66' alt=\"myalt\"></figure>"

How can I retrieve this link :

http://myphotos.net/image.ashx?type=2&image=Images\\2\\9\\11\\12\\3\\8\\4\\7\\685621455625.jpg

All string are the same type so somehow I need to get substring between src= and href . But I don't know how to do that. Thanks.

If you parse HTML don't not use string methods but a real HTML parser like HtmlAgilityPack :

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(html);  // html is your string
var linksAndImages = doc.DocumentNode.SelectNodes("//a/@href | //img/@src");
var allSrcList = linksAndImages
    .Select(node => node.GetAttributeValue("src", "[src not found]"))
    .ToList();

您可以使用正则表达式:

var src = Regex.Match("the string", "<img.+?src=[\"'](.+?)[\"'].*?>", RegexOptions.IgnoreCase).Groups[1].Value;

通常,在解析 HTML 代码中的值时,您应该使用 HTML/XML 解析器,但是对于像这样的有限字符串,Regex 就可以了。

string url = Regex.Match(htmlString, @"src='(.*?)'").Groups[1].Value;

If your string is always in same format, you can easily do this like so :

string input =  "<figure><img src='http://myphotos.net/image.ashx?type=2&image=Images\\2\\9\\11\\12\\3\\8\\4\\7\\685621455625.jpg' href='JavaScript:void(0);' onclick='return takeImg(this)' tabindex='1' class='myclass' width='55' height='66' alt=\"myalt\"></figure>";
// link is between ' signs starting from the first ' sign so you can do :
input = input.Substring(input.IndexOf("'")).Substring(input.IndexOf("'"));
// now your string looks like : "http://myphotos.net/image.ashx?type=2&image=Images\\2\\9\\11\\12\\3\\8\\4\\7\\685621455625.jpg"

return input;
string str = "<figure><imgsrc = 'http://myphotos.net/image.ashx?type=2&image=Images\\2\\9\\11\\12\\3\\8\\4\\7\\685621455625.jpg'href = 'JavaScript:void(0);' onclick = 'return takeImg(this)'tabindex = '1' class='myclass' width='55' height='66' alt=\"myalt\"></figure>";

int pFrom = str.IndexOf("src = '") + "src = '".Length;
int pTo = str.LastIndexOf("'href");

string url = str.Substring(pFrom, pTo - pFrom);

Source :

Get string between two strings in a string

Q is your string in this case, i look for the index of the attribute you want (src = ') then I remove the first few characters (7 including spaces) and after that you look for when the text ends by looking for '.

With removing the first few characters you could use .IndexOf to look for how many to delete so its not hard coded.

        string q =
            "<figure><img src = 'http://myphotos.net/image.ashx?type=2&image=Images\\2\\9\\11\\12\\3\\8\\4\\7\\685621455625.jpg' href = 'JavaScript:void(0);' onclick = 'return takeImg(this)'" +
            "tabindex = '1' class='myclass' width='55' height='66' alt=\"myalt\"></figure>";
        string z = q.Substring(q.IndexOf("src = '"));
        z = z.Substring(7);
        z = z.Substring(0, z.IndexOf("'"));
        MessageBox.Show(z);

This is certainly not the most elegant way (look at the other answers for that :)).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM