繁体   English   中英

Jsoup拆分块元素以保留文本和子元素的顺序

[英]Jsoup split block element to preserve order of text and child elements

假设我有一个像这样的 html 片段:

<p>Generally speaking, in the U.S., if you want to <a href="\&quot;https://web.archive.org/web/20210408195204/https://www.wsj.com/articles/investors-big-and-small-are-driving-stock-gains-with-borrowed-money-11617799940\&quot;"> 
  borrow money from your broker to buy stocks</a>, you are capped at 2-to-1 leverage. If you have $100, you can buy $200 worth of stock. Back in the olden days, you could have bought&nbsp;$300 or $500 or $1,000 of stock with your $100, borrowing the rest from your broker, but then a Great Depression happened and regulators clamped down on margin lending. </p>

我已经用jsoup对其进行了解析,并将其表示为Element

我希望能够将元素拆分为:

  • 一段文字: "Generally speaking, in the US, if you want to "
  • 一个元素<a href="\&quot;https://web.archive.org/web/20210408195204/https://www.wsj.com/articles/investors-big-and-small-are-driving-stock-gains-with-borrowed-money-11617799940\&quot;">
  • 另一段文字: "borrow money from your broker to buy stocks</a>, you are capped at 2-to-1 leverage. If you have $100, you can buy $200 worth of stock. Back in the olden days, you could have bought&nbsp;$300 or $500 or $1,000 of stock with your $100, borrowing the rest from your broker, but then a Great Depression happened and regulators clamped down on margin lending"

并在保留这些部分的顺序的同时做到这一点。

到目前为止我看过的东西:

  • getAllElements() 仅返回索引 0 处的 p 标签本身,然后返回 a 标签的元素
  • children() 仅返回 a 标签的 1 个元素。

你想要的是childNodes()方法。 然后你得到一个List<Node> 您可以使用这些节点,访问它们的属性和父节点,但是您无法使用select()访问更深的标签,因为只有Element class 具有此方法。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM