简体   繁体   English

从HTML Img标签检索网址

[英]Retrive the Url from an Html Img Tag

BackGround Info 背景信息

Currently working on a C# web api that will be returning selected Img url's as base64. 当前正在使用C#Web API,它将返回选定的Img URL作为base64。 I currently have the functionality that would preform the base64 conversion however, I am getting a large amount of text which also include Img Url's which I will need to crop out of the string and give it to my function to convert the img to base 64. I read up on an lib.("HtmlAgilityPack;") that should make this task easy but when I am use it I get "HtmlDocument.cs" not found. 我目前具有执行base64转换的功能,但是,我得到了大量文本,其中还包括Img Url,需要从字符串中裁剪出来并将其提供给我的函数,以将img转换为base 64。我阅读了一个lib。(“ HtmlAgilityPack;”),它应该使此任务容易完成,但是当我使用它时,却找不到“ HtmlDocument.cs”。 However, I am not submitting a document, but sending it a string which is HTML. 但是,我不是提交文档,而是向其发送HTML字符串。 I read the doc and it is suppose to work with a string as well, but it is not working for me. 我阅读了该文档,并假设它也可以使用字符串,但是它对我不起作用。 This is the code using "HtmlAgilityPack". 这是使用“ HtmlAgilityPack”的代码。

NON WORKING CODE 非工作代码

foreach(var item in returnList)
                    {
                         if (item.Content.Contains("~~/picture~~"))
                        {
                            HtmlDocument doc = new HtmlDocument();
                            doc.Load(item.Content);

Error Message From HtmlAgilityPack 来自HtmlAgilityPack的错误消息

在此处输入图片说明

Question I am receiving a string which is Html from SharePoint. 问题我从SharePoint收到一个Html字符串。 This Html string may be tokenized with heading tokens and/or picture tokens. 该HTML字符串可以用标题标记和/或图片标记来标记。 I am trying to isolate the retrieve the html from the img src Hmtl tag. 我试图隔离从img src Hmtl标签检索html。 I understand that regex may be impractical, but I would consider working with a regex expressions is it available to retrieve the url from img src. 我知道正则表达式可能不切实际,但我会考虑使用正则表达式来从img src检索URL。

Sample String 样本字符串

Bullet~~Increased Cash Flow</li><li>~~/Document Text Bullet~~Tax Efficient Organizational Structures</li><li>~~/Document Text Bullet~~Tax Strategies that Closely Align with Business Strategies</li><li>~~/Document Text Bullet~~Complete Knowledge of State and Local Tax Obligations</li></ul><p>~~/Document Heading 2~~is the firm of choice</p><p>~~/Document Text~~When it comes to accounting and advisory services is the unique firm of choice. As a trusted advisor to our clients, we bring an integrated client service approach with dedicated industry experience. Dixon Hughes Goodman respects the value of every client relationship and provides clients throughout the U.S. with an unwavering commitment to hands-on, personal attention from our partners and senior-level professionals.</p><p>~~/Document Text~~of choice for clients in search of a trusted advisor to deal with their state and local tax needs. Through our leading best practices and experience, our SALT professionals offer quality and ease to the client engagement. We are proud to provide highly comprehensive services.</p>

    <p>~~/picture~~<br></p><p> 
          <img src="/sites/ContentCenter/Graphics/map-al.jpg" alt="map al" style="width&#58;611px;height&#58;262px;" />&#160;
    <br></p><p><br></p><p>
    ~~/picture~~<br></p><p>
          <img src="/sites/ContentCenter/Graphics/Firm_Telescope_Illustration.jpg" alt="Firm_Telescope_Illustration.jpg" style="margin&#58;5px;width&#58;155px;height&#58;155px;" />    </p><p></div><div class="ExternalClassAF0833CB235F437993D7BEE362A1A88A"><br></div><div class="ExternalClassAF0833CB235F437993D7BEE362A1A88A"><br></div><div class="ExternalClassAF0833CB235F437993D7BEE362A1A88A"><br></div>

Important 重要

I am working with an HTML string, not a file. 我正在使用HTML字符串,而不是文件。

string matchString = Regex.Match(original_text, "<img.+?src=[\"'](.+?)[\"'].+?>", RegexOptions.IgnoreCase).Groups[1].Value;

It has been asked multiple times here . 这里已被多次询问。

also here 也在这里

The issue you are having is that C# is looking for a file and since it is not finding it, it tells you. 您遇到的问题是C#正在寻找文件,并且由于找不到文件,它会告诉您。 This is not an error that will brake your app, it is just telling you that the file is not found and the Lib will than read the string given. 这不是会使您的应用程序崩溃的错误,它只是告诉您未找到文件,并且Lib会读取给定的字符串。 This documentation can be found here https://htmlagilitypack.codeplex.com/SourceControl/latest#Trunk/HtmlAgilityPackDocumentation.shfbproj . 可以在以下网址找到此文档:https://htmlagilitypack.codeplex.com/SourceControl/latest#Trunk/HtmlAgilityPackDocumentation.shfbproj The code below is a cookie cutter model that anyone can use. 下面的代码是任何人都可以使用的cookie切割器模型。

Important 重要

C# is looking for a file which can not be displayed, because it a string that is supplied. C#正在寻找一个无法显示的文件,因为它提供了一个字符串。 That is the message that you are getting, however your still will work as well with accordance to the doc provided and will not effect your code. 那就是您得到的消息,但是您仍然可以按照提供的文档正常工作,并且不会影响您的代码。

Exmample Code 范例代码

HtmlAgilityPack.HtmlDocument htmlDocument = new HtmlAgilityPack.HtmlDocument();
htmlDocument.LoadHtml("YourContent"); // can be a string or can be a path.

HtmlAttribute att = url.Attributes["src"];
Uri imgUrl = new System.Uri("Url"+ att.Value); // build your url

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM