简体   繁体   中英

how to get only the parent tag text from html in C#

i am actually trying to grap the text from a tag which has some child tags

For example :

<p><span>Child Text </span><span class="price">Child Text</span><br />
I need this text</p>

This is what i am trying

HtmlElement menuElement = browser.Document.GetElementsByTagName("p");
String mytext = menuElement.InnerHtml;   //also tried innerText,OuterHtml,OuterText

UPDATE: I think i have to use Htmlagilitypack, so now my question is how to do this using htmlagilitypack lib, I'm new to it.

Thanks

There are many approaches to this from using regex to web scraping libraries.i recommend you to use htmlagilitypack with that you can address exactly what you need by xpath. add reference and namespace to HtmlAgilityPack and i 'm using linq(this requires .net 3.5 or better) with the code below you can do that.

using HtmlAgilityPack;
using System.Linq;

// these references must be available.

        private void Form1_Load(object sender, EventArgs e)
        {
            var rawData = "<p><span>Child Text </span><span class=\"price\">Child Text</span><br />I need this text</p>";
            var html = new HtmlAgilityPack.HtmlDocument();
            html.LoadHtml(rawData);
            html.DocumentNode.SelectNodes("//p/text()").ToList().ForEach(x=>MessageBox.Show(x.InnerHtml));
        }

It's much, much easier if you can put the "need this text" inside a span with an id -- then you just grab that id's .innerHTML(). If you can't change the markup, you can grab menuElement's .innerHTML() and string match for content after "
", but that's quite fragile.

You can get the text by splitting the DocumentText up into different parts.

string text = "<p><span>Child Text </span><span class="price">Child Text</span><br />I need this text</p>";
text = text.Split(new string{"<p><span>Child Text </span><span class="price">Child Text</span><br />"}, StringSplitOptions.None)[1];
// Splits the first part of the text, leaving us with "I need this text</p>"
// We can remove the last </p> many ways, but here I will show you one way.
text = text.Split(new string{"</p>"}, StringSplitOptions.None)[0];
// text now has the value of "I need this text"

Hope this Helps!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM