[英]JTidy reports “3 errors were found!”… but does not say what they are
I have a large block of programmatically generated HTML. 我有一大堆以编程方式生成的HTML。 I ran it through Tidy (version r938) with the following Java code:
我使用以下Java代码通过Tidy(版本r938)运行了该代码:
StringReader inStr = new StringReader(htmlInput);
StringWriter outStr = new StringWriter();
Tidy tidy = new Tidy();
tidy.setXHTML(true);
tidy.parseDOM(inStr, outStr);
I get the following output: 我得到以下输出:
InputStream: Document content looks like HTML 4.01 Transitional
247 warnings, 3 errors were found!
This document has errors that must be fixed before
using HTML Tidy to generate a tidied up version.
Trouble is, Tidy doesn't tell me what 3 errors it found. 麻烦的是,Tidy没有告诉我发现了3个错误。
I'm fibbing here a little. 我在这里摆弄一点。 The output above actually follows a long list of all 247 warnings (mostly trimming out empty
div
elements). 上面的输出实际上是所有247条警告的一长串(主要是修剪掉空的
div
元素)。 I can suppress those with tidy.setShowWarnings(false)
; 我可以使用
tidy.setShowWarnings(false)
禁止显示这些tidy.setShowWarnings(false)
; either way, I see no error report, so I can't figure out what I need to fix. 无论哪种方式,我都看不到错误报告,因此无法弄清楚需要解决的问题。 300Kb of HTML is too much for me to eyeball.
300Kb的HTML对我来说实在太大了。
I've tried numerous approaches to finding the error. 我尝试了多种方法来查找错误。 I can't run it through validate.w3.org, sadly, as the HTML file is on a proprietary network.
遗憾的是,由于HTML文件位于专有网络上,因此我无法通过validate.w3.org运行它。 The most informative approach was to open it in IntelliJ IDEA;
最有用的方法是在IntelliJ IDEA中打开它。 this revealed a dozen or so duplicate div IDs, which I fixed.
这显示了十几个重复的div ID,我已对其进行了修复。 Errors still occurred.
仍然发生错误。
I've looked around for other mentions of this problem. 我到处寻找有关此问题的其他提及。 While I find plenty of hits on things like "How can I get the error/warning messages out of the parsed HTML using JTidy?"
当我发现诸如“如何使用JTidy如何从已解析的HTML中获取错误/警告消息?”之类的热门文章时, , they all appear to be asking for dissimilar things, or assume conditions that simply aren't holding for me.
,他们似乎都在要求不同的东西,或者假设条件根本不适合我。 I'm getting warnings just fine, for example;
例如,我得到的警告很好。 it's the errors I need, and they're not being reported, even if I call
setShowErrors(100)
or something. 这是我需要的错误 ,即使我调用
setShowErrors(100)
东西也没有得到报告。
Am I going to have to dive into Tidy's source code and debug it, starting where it reports errors? 我是否必须深入Tidy的源代码并对其进行调试,从报告错误的地方开始? Or is there something much simpler I could do?
还是我可以做些更简单的事情?
Here's what I ended up doing to track down the errors: 这是我最终要找出错误的方法:
org.w3.tidy.Report.error()
increments lexer.errors
; org.w3.tidy.Report.error()
的第一行增加lexer.errors
; error()
is called from many places in the lexer. error()
。 lexbuf
is a byte array, so your IDE might not show it as text. lexbuf
是一个字节数组,因此您的IDE可能不会将其显示为文本。 It might also be large. lexbuf
. lexbuf
的词法分析器正在查看的lexbuf
。 If you have to, take that section of the byte array and cross-reference it with an ASCII table to get the text. This was much more involved than it probably should have been. 这比原本应该涉及的要复杂得多。 I suspect
Report.error()
was being called inappropriately. 我怀疑
Report.error()
被不当调用。
In my case, error()
was called with the constant BAD_CDATA_CONTENT
. 在我的情况下,使用常量
BAD_CDATA_CONTENT
调用error()
。 This constant is used only by Report.warning()
. 此常量仅由
Report.warning()
。 error()
doesn't know what to do with it, and just exits silently with no message at all . error()
不知道如何处理它,只是安静地退出,根本没有任何消息 。 If I change the call in Lexer.getCDATA()
from error()
to warning()
, I get the exact line and column of my error. 如果将
Lexer.getCDATA()
的调用从error()
更改为warning()
,则将获得错误的确切行和列。 (I also get what appears to be reasonably well-formed XHTML, instead of an empty document.) (我也得到了看上去格式合理的XHTML,而不是空文档。)
I'd submit a ticket to the JTidy project with some suggestions, but SourceForge isn't letting me log in for some reason. 我会向JTidy项目提交票证并提供一些建议,但是出于某些原因,SourceForge不允许我登录。 So, here:
所以在这里:
script
element; shouldn't have hurt anything. I asked another question about it , just in case.) script
元素内的注释;应该不会造成任何伤害。为防万一, 我问了另一个问题 。) Report.error()
should have a default case that reports an unhandled error code if it gets one. Report.error()
应该有一个默认情况,如果它得到一个,它将报告未处理的错误代码。 Hope this helps anyone else having what I'm guessing is a rather esoteric problem. 希望这可以帮助其他任何有我猜测是深奥的问题的人。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.