简体   繁体   中英

How do I output individual <p> tags using HTML Agility Pack to a rich text box?

I'm just learning how to use HTML Agility Pack to scrape text off of webpages. I am looking to get the biographies of heros in Overwatch by Blizzard from their site. I'm currently using this to find and write the desired text to a rich text box.

var paragraphs = page.DocumentNode.SelectNodes("//div[@class='hero-bio-backstory pad-sm']");

     foreach(HtmlNode node in paragraphs)
     {
         rchTxtBox.AppendText(node.InnerText);
         rchTxtBox.AppendText("\n");
     }

What I am trying to get is the InnerText of each < p > with a return line in between them.

<div class="hero-bio-backstory pad-sm"> == $0
     <p>...</p>
     <p>...</p>
     <p>...</p>
     <p>...</p>
</div>

Instead of outputting each paragraph with a return character between them, it is writing all of them into one solid chunk. Is there a way to do this?

Your selector //div[@class='hero-bio-backstory pad-sm'] is returning one node - the entire div . When you then call InnerText on this node, it is returning the text in the entire div, sans markup. Therefore you are seeing the behavior you describe: your loop runs once, appends all the text in one chunk, then adds a single trailing newline.

You need to use an XPath expression which will select all the p nodes, ie //div[@class='hero-bio-backstory pad-sm']/p .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM