简体   繁体   English

Java 在不更改格式的情况下将属性添加到 HTML 标签

[英]Java add attribute to HTML tags without changing formatting

A have a task to make a maven plugin which takes HTML files in certain location and adds a service attribute to each tag that doesn't have it. A 有一个任务是制作一个 maven 插件,它在特定位置获取 HTML 个文件,并为每个没有它的标签添加一个服务属性。 This is done on the source code which means my colleagues and I will have to edit those files further.这是在源代码上完成的,这意味着我和我的同事将不得不进一步编辑这些文件。

As a first solution I turned to Jsoup which seems to be doing the job but has one small yet annoying problem: if we have a tag with multiple long attributes (we often do as this HTML code is a source for further processing) we wrap the lines like this:作为第一个解决方案,我求助于Jsoup ,它似乎可以完成这项工作,但有一个小而烦人的问题:如果我们有一个带有多个长属性的标签(我们经常这样做,因为这个 HTML 代码是进一步处理的来源)我们包装像这样的行:

<ui:grid id="category_search" title="${handler.getMessage( 'title' )}" 
        class="is-small is-outlined is-hoverable is-foldable"
        filterListener="onApplyFilter" paginationListener="onPagination" ds="${handler.ds}" 
        filterFragment="grid_filter" contentFragment="grid_contents"/>

However, Jsoup turns this into one very long line:然而, Jsoup将其变成了一行很长的代码:

<ui:grid id="category_search" title="${handler.getMessage( 'title' )}" class="is-small is-outlined is-hoverable is-foldable" filterListener="onApplyFilter" paginationListener="onPagination" ds="${handler.ds}" filterFragment="grid_filter" contentFragment="grid_contents"/>

Which is a bad practice and real pain to read and edit.这是一种不好的做法,阅读和编辑真的很痛苦。

So is there any other not very convoluted way to add this attribute without parsing and recomposing HTML code or maybe somehow preserve line breaks inside the tag?那么有没有其他不太复杂的方法来添加这个属性而不解析和重组 HTML 代码或者可能以某种方式保留标签的换行符?

Unfortunately JSoup's main use case is not to create HTML that is read or edited by humans.不幸的是,JSoup 的主要用例不是创建由人类读取或编辑的 HTML。 Specifically JSoup's API is very closely modeled after DOM which has no way to store or model line breaks inside tags, so it has no way to preserve them.具体来说,JSoup 的 API 非常接近无法存储的DOM或标签内的 model 换行符,因此无法保留它们。

I can think of only two solutions:我只能想到两个解决方案:

  1. Find (or write) an alternative HTML parser library, that has an API that preserves formatting inside tags.找到(或编写)一个替代的 HTML 解析器库,它有一个 API 保留标签内的格式。 I'd be surprised if such a thing already exists.如果这样的事情已经存在,我会感到惊讶。

  2. Run the generated code through a formatter that supports wrapping inside tags.通过支持包装在标签内的格式化程序运行生成的代码。 This won't preserve the original line breaks, but at least the attributes won't be all on one line.这不会保留原始换行符,但至少属性不会全部在一行上。 I wasn't able to find a Java library that does that, so you may need to consider using an external program.我找不到 Java 库来执行此操作,因此您可能需要考虑使用外部程序。

It seems there is no good way to preserve breaks inside tags while parsing them into POJOs (or I haven't found one), so I wrote a simple tokenizer which splits incoming HTML string into parts sort of like this:似乎没有什么好方法可以在将标签解析为 POJO 时保留标签内的中断(或者我还没有找到),所以我编写了一个简单的分词器,它将传入的 HTML 字符串分成如下所示的部分:

String[] parts = html.split( "((?=<)|(?<=>))" );

This uses regex lookups to split before < and after > .这使用正则表达式查找在<之前和>之后拆分。 Then just iterate over parts and decide whether to insert attribute or not.然后只需遍历零件并决定是否插入属性。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM