简体   繁体   English

JSOUP HTML解析器

[英]JSOUP HTML Parser

Is there a way to get start line & column number and end line & column number of element/tag ? 有没有一种方法来获取元素/标签的开始行和列号以及结束行和列号

I am creating HTML editor that needs to highlight tag for speed optimization based on some scenario by given start and end line & column number . 我正在创建HTML编辑器,该编辑器需要根据给定的起始行和结束行及列号来突出显示标记,以便根据某些情况优化速度。

No, unfortunately this is not possible with jsoup at the current time. 不,不幸的是,当前无法通过jsoup实现。

At the moment Jsoup does not track line numbers / character positions when parsing, so it's not possible to extract them. 目前,Jsoup在解析时不跟踪行号/字符位置,因此无法提取它们。 As this is not a core use case, I don't want to extend the memory requirements of the DOM by retaining this data. 由于这不是核心用例,因此我不想通过保留此数据来扩展DOM的内存要求。 I have thought about possibly adding an optional side-channel way to track it during the parse, in a similar way as how parse errors can be tracked, but haven't focused on implementing that yet. 我曾考虑过可能添加一种可选的旁通道方法来在解析过程中对其进行跟踪,类似于如何跟踪解析错误,但尚未将其重点放在实现上。

Source: https://groups.google.com/forum/#!topic/jsoup/lnbYSIZApWw 来源: https//groups.google.com/forum/#!topic / jsoup / lnbYSIZApWw

Instead, you could try Jericho HTML Parser . 相反,您可以尝试Jericho HTML Parser In its list of features it says: 在功能列表中说:

The row and column number of each position in the source document are easily accessible. 可以轻松访问源文档中每个位置的行号和列号。

See the javadocs here and look into methods such as getRow() , getColumn() , and getRowColumnVector() . 请参阅此处的javadocs 并研究诸如getRow()getColumn()getRowColumnVector()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM