简体   繁体   English

Jsoup拆分块元素以保留文本和子元素的顺序

[英]Jsoup split block element to preserve order of text and child elements

Lets say I have an html fragment like this:假设我有一个像这样的 html 片段:

<p>Generally speaking, in the U.S., if you want to <a href="\&quot;https://web.archive.org/web/20210408195204/https://www.wsj.com/articles/investors-big-and-small-are-driving-stock-gains-with-borrowed-money-11617799940\&quot;"> 
  borrow money from your broker to buy stocks</a>, you are capped at 2-to-1 leverage. If you have $100, you can buy $200 worth of stock. Back in the olden days, you could have bought&nbsp;$300 or $500 or $1,000 of stock with your $100, borrowing the rest from your broker, but then a Great Depression happened and regulators clamped down on margin lending. </p>

I've parsed it with jsoup and its represented as an Element .我已经用jsoup对其进行了解析,并将其表示为Element

I'd like to be able to split the element into something like:我希望能够将元素拆分为:

  • A fragment of text: "Generally speaking, in the US, if you want to "一段文字: "Generally speaking, in the US, if you want to "
  • an Element <a href="\&quot;https://web.archive.org/web/20210408195204/https://www.wsj.com/articles/investors-big-and-small-are-driving-stock-gains-with-borrowed-money-11617799940\&quot;">一个元素<a href="\&quot;https://web.archive.org/web/20210408195204/https://www.wsj.com/articles/investors-big-and-small-are-driving-stock-gains-with-borrowed-money-11617799940\&quot;">
  • Another fragment of text: "borrow money from your broker to buy stocks</a>, you are capped at 2-to-1 leverage. If you have $100, you can buy $200 worth of stock. Back in the olden days, you could have bought&nbsp;$300 or $500 or $1,000 of stock with your $100, borrowing the rest from your broker, but then a Great Depression happened and regulators clamped down on margin lending"另一段文字: "borrow money from your broker to buy stocks</a>, you are capped at 2-to-1 leverage. If you have $100, you can buy $200 worth of stock. Back in the olden days, you could have bought&nbsp;$300 or $500 or $1,000 of stock with your $100, borrowing the rest from your broker, but then a Great Depression happened and regulators clamped down on margin lending"

And do this while preserving the ordering of those pieces.并在保留这些部分的顺序的同时做到这一点。

Things I've looked at so far:到目前为止我看过的东西:

  • getAllElements() only returns the p tag itself at index 0 and then the element for the a tag getAllElements() 仅返回索引 0 处的 p 标签本身,然后返回 a 标签的元素
  • children() only returns 1 element for the a tag. children() 仅返回 a 标签的 1 个元素。

What you want is childNodes() method.你想要的是childNodes()方法。 Then you get a List<Node> .然后你得到一个List<Node> You can work with these nodes, access their attributes and parents, but you lose the possibility to use select() to access deeper tags because only Element class has this method.您可以使用这些节点,访问它们的属性和父节点,但是您无法使用select()访问更深的标签,因为只有Element class 具有此方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM