[英]Logback System.err output uses wrong encoding
I'm using Logback 1.2.11 with Java 17 on Windows 10. I'm using the following logback.xml
:我在 Windows 10 上使用带有 Java 17 的 Logback 1.2.11。我正在使用以下
logback.xml
:
<configuration>
<property scope="context" name="COLORIZER_COLORS" value="boldred@,boldyellow@,boldcyan@,@,@" />
<conversionRule conversionWord="colorize" converterClass="org.tuxdude.logback.extensions.LogColorizer" />
<statusListener class="ch.qos.logback.core.status.NopStatusListener" />
<appender name="STDERR" class="ch.qos.logback.core.ConsoleAppender">
<target>System.err</target>
<withJansi>true</withJansi>
<encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
<pattern>[%colorize(%level)] %msg%n</pattern>
</encoder>
</appender>
<root level="INFO">
<appender-ref ref="STDERR" />
</root>
</configuration>
If in my code I use System.out.println("é")
or System.err.println("é")
, I see an é
(U+00E9, a small letter e with acute accent) on the console as expected.如果在我的代码中使用
System.out.println("é")
或System.err.println("é")
,我会按预期在控制台上看到一个é
(U+00E9,一个带重音的小写字母 e) . However if I log through Logback (via SLF4J), it shows a Θ
character (U+0398, a Greek capital letter theta) on the screen.但是,如果我通过 Logback(通过 SLF4J)登录,它会在屏幕上显示一个
Θ
字符(U+0398,希腊大写字母 theta)。 This happens whether I use <target>System.out</target>
or <target>System.err</target>
in my logback.xml
file.无论我在
logback.xml
文件中使用<target>System.out</target>
还是<target>System.err</target>
都会发生这种情况。
By default the PatternLoutEncoder
for ConsoleAppender
should be using the system default encoding.默认情况下,
ConsoleAppender
的PatternLoutEncoder
应该使用系统默认编码。 (See LogBack default charset for LayoutWrappingEncoder? for extensive discussion.) The Windows 10 console encoding in my locale should be Windows-1252 (or in Powershell, ISO-8859-1). (有关详细讨论,请参阅LayoutWrappingEncoder? 的 LogBack 默认字符集。)我的语言环境中的 Windows 10 控制台编码应该是 Windows-1252(或 Powershell 中的 ISO-8859-1)。 The Θ character doesn't even appear in either of those charsets.
Θ 字符甚至没有出现在这些字符集中。
Why is Logback printing a Θ
character to the standard output when it should be printing an é
character?为什么 Logback 在应该打印
é
字符时将Θ
字符打印到标准输出? More generally, why isn't Logback using the default encoding when printing to System.out
or System.err
?更一般地说,为什么在打印到
System.out
或System.err
时 Logback 不使用默认编码?
It looks like Logback is using the wrong "default charset".看起来 Logback 使用了错误的“默认字符集”。 The API Javadocs of
System.out
says this about its default charset (which applies to System.err
as well): System.out
的 API Javadocs 说明了它的默认字符集(也适用于System.err
):
The "standard" output stream.
“标准”输出流。 This stream is already open and ready to accept output data.
此流已经打开并准备好接受输出数据。 Typically this stream corresponds to display output or another output destination specified by the host environment or user.
通常,此流对应于主机环境或用户指定的显示输出或另一个输出目的地。 The encoding used in the conversion from characters to bytes is equivalent to
Console.charset()
if theConsole
exists,Charset.defaultCharset()
otherwise.如果
Console
存在,则从字符到字节的转换中使用的编码等效于Console.charset()
,否则等效于Charset.defaultCharset()
。
On my Windows 10 Command Prompt, Charset.defaultCharset()
returns windows-1252
, while System.console().charset()
returns IBM437
.在我的 Windows 10 命令提示符上,
Charset.defaultCharset()
返回windows-1252
,而System.console().charset()
返回IBM437
。 If create a new OutputStreamWriter(System.out, System.console().charset())
and write the string "é"
, it produces é
as expected.如果创建一个
new OutputStreamWriter(System.out, System.console().charset())
并写入字符串"é"
,它将按预期生成é
。 But sure enough if I use new OutputStreamWriter(System.out, Charset.defaultCharset())
and write "é"
, it produces Θ
!但是如果我使用
new OutputStreamWriter(System.out, Charset.defaultCharset())
并写"é"
,它肯定会产生Θ
! So that's where the Θ was coming from—it is part of the IBM437
charset!这就是 Θ 的来源——它是
IBM437
字符集的一部分!
I won't ask here why my Windows 10 Command Prompt is defaulting to IBM437
as its default charset;我不会在这里问为什么我的 Windows 10 命令提示符默认使用
IBM437
作为其默认字符集; in the context of this issue, that's beside the point.在这个问题的背景下,这是无关紧要的。
The root problem seems to be that Logback is retrieving the default character set erroneously.根本问题似乎是 Logback 错误地检索了默认字符集。 (It's a long story , but basically Logback is relying on the default charset of
String.getBytes()
.) Ultimately Lobback in LayoutWrappingEncoder
is relying on the value of Charset.defaultCharset()
, which doesn't match that of the console; (说来话长,但基本上 Logback依赖于String.getBytes
String.getBytes()
的默认字符集。) LayoutWrappingEncoder 中的LayoutWrappingEncoder
最终依赖于Charset.defaultCharset()
的值,这与控制台的值不匹配; instead it should be defaulting to System.console().charset()
if it wants to match the default charset of the console.相反,它应该默认为
System.console().charset()
如果它想要匹配控制台的默认字符集。
Apparently the LayoutWrappingEncoder
doesn't know if it's writing to the console or some other output stream that in fact uses Charset.defaultCharset()
.显然
LayoutWrappingEncoder
不知道它是在写入控制台还是其他实际上使用Charset.defaultCharset()
的输出流。 Perhaps there needs to be some way that ch.qos.logback.core.OutputStreamAppender
can expose its charset to LayoutWrappingEncoder
, and ch.qos.logback.core.ConsoleAppender
can override the default based on System.console().charset()
instead of Charset.defaultCharset()
.也许需要某种方式
ch.qos.logback.core.OutputStreamAppender
可以将其字符集公开给LayoutWrappingEncoder
,而ch.qos.logback.core.ConsoleAppender
可以覆盖基于System.console().charset()
的默认值的Charset.defaultCharset()
。
In any case the culprit here seems to be Logback using the wrong default charset for the console for System.out
and System.err
.无论如何,这里的罪魁祸首似乎是 Logback 为
System.out
和System.err
的控制台使用了错误的默认字符集。 (Anyone know how I can tell Logback to use System.console().charset()
instead of Charset.defaultCharset()
? I certainly don't have any way of knowing the default console charset ahead of time, so I can't hard-code it into logback.xml
.) (任何人都知道我如何告诉 Logback 使用
System.console().charset()
而不是Charset.defaultCharset()
?我当然没有办法提前知道默认的控制台字符集,所以我不能将其硬编码到logback.xml
中。)
I have filed Logback bug LOGBACK-1642 .我已经提交了 Logback 错误LOGBACK-1642 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.