简体   繁体   English

如何防止OWASP HTML清理程序限制行长度?

[英]How to keep OWASP HTML sanitizer from limiting line length?

I have to put several 100000 very old html documents into a web application. 我必须将几个100000个非常旧的html文档放入Web应用程序中。 I saw great effects while using the OWASP HTML Sanitizer and was able to ensure that properly sanitized HTML is created. 我在使用OWASP HTML Sanitizer时看到了很好的效果,并且能够确保创建正确清理的HTML。 My only problem is that HTML Sanitizer puts a hard limit on the maximum line length. 我唯一的问题是HTML Sanitizer对最大行长度设置了硬性限制。 To be exact this is a maximum of 250 byte per line. 确切地说,这是每行最多250个字节。 Unfortunately this has the effect that some words get split in the middle and this is the same with the displayed html (marked with a caret): 不幸的是,这会导致某些单词在中间被分割,这与显示的html(用插入符号标记)相同:

This sentence here is perfectly ok. But in the next s entence there is an additional space in the word "sentence".

                                                     ^

How can I tell the sanitizer not to end the lines too soon ? 我如何告诉消毒剂不要太快结束生产线?

As some of the lines from the originary html are 800 byte or more it would also help if I were able to tell the sanitizer only to insert breaks in whitespace. 由于原始html中的某些行是800字节或更多,如果我能够告诉清理程序只在空白处插入中断,这也会有所帮助。

这不是一个答案,而是一个忏悔:截断行的效果是由我的代码的其他部分引起的,它对输出设置了行长度限制。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM