简体   繁体   中英

How to extract plain text of specified length from html using Jsoup?

I use jsoup-1.5.2 parse html tag string, I want to extract plain text from html string and specify text's length, and keep intact html tag.

examply:

html code:

<p><span>Mike <u>stopp<b>ed</b></u> his work</span></p>

I want results:

specify text length=4

result:<p><span>Mike</span></p>

specify text length=10

result:<p><span>Mike <u>stopp</u></span></p>

specify text length=12

result:<p><span>Mike <u>stopp<b>ed</b></u></span></p>

specify text length=16

result:<p><span>Mike <u>stopp<b>ed</b></u> his</span></p>

etc.

Can I finish it using jsoup?

It's not straightforward using the Element class unfortunately. The reason being that the 'text()' method within class Element, "Gets the combined text of this element and all its children". This is really irritating as you can't just get the text of a single element. You will need to use the Elements.select(String).text() method from the Elements class and perhaps use a wildcard (if possible). This method will return the 'combined' text of all matching nodes. This is returned as a single string so you can then call String's ' length() ' method on it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM