简体   繁体   English

我如何输出个人 <p> 使用HTML Agility Pack将标签添加到富文本框中?

[英]How do I output individual <p> tags using HTML Agility Pack to a rich text box?

I'm just learning how to use HTML Agility Pack to scrape text off of webpages. 我只是在学习如何使用HTML Agility Pack从网页上抓取文字。 I am looking to get the biographies of heros in Overwatch by Blizzard from their site. 我希望从他们的网站上获得暴雪的《守望先锋》中英雄的传记。 I'm currently using this to find and write the desired text to a rich text box. 我目前正在使用它来查找所需的文本并将其写入到富文本框中。

var paragraphs = page.DocumentNode.SelectNodes("//div[@class='hero-bio-backstory pad-sm']");

     foreach(HtmlNode node in paragraphs)
     {
         rchTxtBox.AppendText(node.InnerText);
         rchTxtBox.AppendText("\n");
     }

What I am trying to get is the InnerText of each < p > with a return line in between them. 我想要得到的是每个<p>的InnerText,它们之间有一个返回行。

<div class="hero-bio-backstory pad-sm"> == $0
     <p>...</p>
     <p>...</p>
     <p>...</p>
     <p>...</p>
</div>

Instead of outputting each paragraph with a return character between them, it is writing all of them into one solid chunk. 而不是在每个段落之间使用返回字符输出,而是将所有段落写入一个固定的块中。 Is there a way to do this? 有没有办法做到这一点?

Your selector //div[@class='hero-bio-backstory pad-sm'] is returning one node - the entire div . 您的选择器//div[@class='hero-bio-backstory pad-sm']返回一个节点-整个div When you then call InnerText on this node, it is returning the text in the entire div, sans markup. 然后,当您在此节点上调用InnerText时,它将返回整个div(无标记)中的文本。 Therefore you are seeing the behavior you describe: your loop runs once, appends all the text in one chunk, then adds a single trailing newline. 因此,您将看到所描述的行为:循环运行一次,将所有文本追加到一个块中,然后添加单个尾随换行符。

You need to use an XPath expression which will select all the p nodes, ie //div[@class='hero-bio-backstory pad-sm']/p . 您需要使用XPath表达式来选择所有p节点,即//div[@class='hero-bio-backstory pad-sm']/p

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM