简体   繁体   中英

Logback System.err output uses wrong encoding

I'm using Logback 1.2.11 with Java 17 on Windows 10. I'm using the following logback.xml :

<configuration>
  <property scope="context" name="COLORIZER_COLORS" value="boldred@,boldyellow@,boldcyan@,@,@" />
  <conversionRule conversionWord="colorize" converterClass="org.tuxdude.logback.extensions.LogColorizer" />
  <statusListener class="ch.qos.logback.core.status.NopStatusListener" />
  <appender name="STDERR" class="ch.qos.logback.core.ConsoleAppender">
    <target>System.err</target>
    <withJansi>true</withJansi>
    <encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
      <pattern>[%colorize(%level)] %msg%n</pattern>
    </encoder>
  </appender>
  <root level="INFO">
    <appender-ref ref="STDERR" />
  </root>
</configuration>

If in my code I use System.out.println("é") or System.err.println("é") , I see an é (U+00E9, a small letter e with acute accent) on the console as expected. However if I log through Logback (via SLF4J), it shows a Θ character (U+0398, a Greek capital letter theta) on the screen. This happens whether I use <target>System.out</target> or <target>System.err</target> in my logback.xml file.

By default the PatternLoutEncoder for ConsoleAppender should be using the system default encoding. (See LogBack default charset for LayoutWrappingEncoder? for extensive discussion.) The Windows 10 console encoding in my locale should be Windows-1252 (or in Powershell, ISO-8859-1). The Θ character doesn't even appear in either of those charsets.

Why is Logback printing a Θ character to the standard output when it should be printing an é character? More generally, why isn't Logback using the default encoding when printing to System.out or System.err ?

It looks like Logback is using the wrong "default charset". The API Javadocs of System.out says this about its default charset (which applies to System.err as well):

The "standard" output stream. This stream is already open and ready to accept output data. Typically this stream corresponds to display output or another output destination specified by the host environment or user. The encoding used in the conversion from characters to bytes is equivalent to Console.charset() if the Console exists, Charset.defaultCharset() otherwise.

On my Windows 10 Command Prompt, Charset.defaultCharset() returns windows-1252 , while System.console().charset() returns IBM437 . If create a new OutputStreamWriter(System.out, System.console().charset()) and write the string "é" , it produces é as expected. But sure enough if I use new OutputStreamWriter(System.out, Charset.defaultCharset()) and write "é" , it produces Θ ! So that's where the Θ was coming from—it is part of the IBM437 charset!

I won't ask here why my Windows 10 Command Prompt is defaulting to IBM437 as its default charset; in the context of this issue, that's beside the point.

The root problem seems to be that Logback is retrieving the default character set erroneously. (It's a long story , but basically Logback is relying on the default charset of String.getBytes() .) Ultimately Lobback in LayoutWrappingEncoder is relying on the value of Charset.defaultCharset() , which doesn't match that of the console; instead it should be defaulting to System.console().charset() if it wants to match the default charset of the console.

Apparently the LayoutWrappingEncoder doesn't know if it's writing to the console or some other output stream that in fact uses Charset.defaultCharset() . Perhaps there needs to be some way that ch.qos.logback.core.OutputStreamAppender can expose its charset to LayoutWrappingEncoder , and ch.qos.logback.core.ConsoleAppender can override the default based on System.console().charset() instead of Charset.defaultCharset() .

In any case the culprit here seems to be Logback using the wrong default charset for the console for System.out and System.err . (Anyone know how I can tell Logback to use System.console().charset() instead of Charset.defaultCharset() ? I certainly don't have any way of knowing the default console charset ahead of time, so I can't hard-code it into logback.xml .)

I have filed Logback bug LOGBACK-1642 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM