简体   繁体   中英

Retrive the Url from an Html Img Tag

BackGround Info

Currently working on a C# web api that will be returning selected Img url's as base64. I currently have the functionality that would preform the base64 conversion however, I am getting a large amount of text which also include Img Url's which I will need to crop out of the string and give it to my function to convert the img to base 64. I read up on an lib.("HtmlAgilityPack;") that should make this task easy but when I am use it I get "HtmlDocument.cs" not found. However, I am not submitting a document, but sending it a string which is HTML. I read the doc and it is suppose to work with a string as well, but it is not working for me. This is the code using "HtmlAgilityPack".

NON WORKING CODE

foreach(var item in returnList)
                    {
                         if (item.Content.Contains("~~/picture~~"))
                        {
                            HtmlDocument doc = new HtmlDocument();
                            doc.Load(item.Content);

Error Message From HtmlAgilityPack

在此处输入图片说明

Question I am receiving a string which is Html from SharePoint. This Html string may be tokenized with heading tokens and/or picture tokens. I am trying to isolate the retrieve the html from the img src Hmtl tag. I understand that regex may be impractical, but I would consider working with a regex expressions is it available to retrieve the url from img src.

Sample String

Bullet~~Increased Cash Flow</li><li>~~/Document Text Bullet~~Tax Efficient Organizational Structures</li><li>~~/Document Text Bullet~~Tax Strategies that Closely Align with Business Strategies</li><li>~~/Document Text Bullet~~Complete Knowledge of State and Local Tax Obligations</li></ul><p>~~/Document Heading 2~~is the firm of choice</p><p>~~/Document Text~~When it comes to accounting and advisory services is the unique firm of choice. As a trusted advisor to our clients, we bring an integrated client service approach with dedicated industry experience. Dixon Hughes Goodman respects the value of every client relationship and provides clients throughout the U.S. with an unwavering commitment to hands-on, personal attention from our partners and senior-level professionals.</p><p>~~/Document Text~~of choice for clients in search of a trusted advisor to deal with their state and local tax needs. Through our leading best practices and experience, our SALT professionals offer quality and ease to the client engagement. We are proud to provide highly comprehensive services.</p>

    <p>~~/picture~~<br></p><p> 
          <img src="/sites/ContentCenter/Graphics/map-al.jpg" alt="map al" style="width&#58;611px;height&#58;262px;" />&#160;
    <br></p><p><br></p><p>
    ~~/picture~~<br></p><p>
          <img src="/sites/ContentCenter/Graphics/Firm_Telescope_Illustration.jpg" alt="Firm_Telescope_Illustration.jpg" style="margin&#58;5px;width&#58;155px;height&#58;155px;" />    </p><p></div><div class="ExternalClassAF0833CB235F437993D7BEE362A1A88A"><br></div><div class="ExternalClassAF0833CB235F437993D7BEE362A1A88A"><br></div><div class="ExternalClassAF0833CB235F437993D7BEE362A1A88A"><br></div>

Important

I am working with an HTML string, not a file.

string matchString = Regex.Match(original_text, "<img.+?src=[\"'](.+?)[\"'].+?>", RegexOptions.IgnoreCase).Groups[1].Value;

It has been asked multiple times here .

also here

The issue you are having is that C# is looking for a file and since it is not finding it, it tells you. This is not an error that will brake your app, it is just telling you that the file is not found and the Lib will than read the string given. This documentation can be found here https://htmlagilitypack.codeplex.com/SourceControl/latest#Trunk/HtmlAgilityPackDocumentation.shfbproj . The code below is a cookie cutter model that anyone can use.

Important

C# is looking for a file which can not be displayed, because it a string that is supplied. That is the message that you are getting, however your still will work as well with accordance to the doc provided and will not effect your code.

Exmample Code

HtmlAgilityPack.HtmlDocument htmlDocument = new HtmlAgilityPack.HtmlDocument();
htmlDocument.LoadHtml("YourContent"); // can be a string or can be a path.

HtmlAttribute att = url.Attributes["src"];
Uri imgUrl = new System.Uri("Url"+ att.Value); // build your url

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM