简体   繁体   中英

Why does using Unicode in a properties file work but not the actual character even when file.encoding is set?

Here is the test.properties file.

mycharacters=ýþÿƛƸ
myotherchars=\u00FD\u00FE\u00FF\u019B\u01B8

Here is the code being used :

import java.awt.FlowLayout;
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.nio.charset.Charset;
import java.util.ResourceBundle;

import javax.swing.*;

public class MultiByteTest2 
{
    public MultiByteTest2()
    {
        ResourceBundle bundle = ResourceBundle.getBundle("test");
        JFrame frame = new JFrame("MultiByte Test");

        JPanel panel = new JPanel();
        panel.setLayout(new FlowLayout());

        JLabel label1 = new JLabel(bundle.getString("mycharacters"));
        JLabel label2 = new JLabel("  ---  " + bundle.getString("myotherchars"));

        panel.add(label1);
        panel.add(label2);

        String defaultCharacterEncoding = System.getProperty("file.encoding");
        System.out.println("defaultCharacterEncoding by property: " + defaultCharacterEncoding);
        System.out.println("defaultCharacterEncoding by code: " + getDefaultCharEncoding());
        System.out.println("defaultCharacterEncoding by charSet: " + Charset.defaultCharset());

        frame.add(panel);
        frame.setSize(300, 300);
        frame.setLocationRelativeTo(null);
        frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
        frame.setVisible(true);

    }
    public static void main(String s[]) 
    {
        MultiByteTest2 myObject = new MultiByteTest2();

    }

    public static String getDefaultCharEncoding(){
        byte [] bArray = {'w'};
        InputStream is = new ByteArrayInputStream(bArray);
        InputStreamReader reader = new InputStreamReader(is);
        String defaultCharacterEncoding = reader.getEncoding();
        return defaultCharacterEncoding;
    }

}

Here is the output :

在此处输入图片说明

Command to run the above code and the output which shows UTF-8 being used.

 >java -Dfile.encoding=UTF-8 MultiByteTest2
 Picked up _JAVA_OPTIONS: -Dfile.encoding=UTF-8
 defaultCharacterEncoding by property: UTF-8
 defaultCharacterEncoding by code: UTF8
 defaultCharacterEncoding by charSet: UTF-8

Three questions :

  1. Why does using the actual characters result in a mess of characters being output?

  2. Why does using the Unicode representation work?

  3. The output shows UTF-8 instead of cp1252 which indicates the file.encoding is being used, but why does it not help when using the actual characters in the properties file?

*.properties use ISO-8859-1, Latin-1. This is a very old design decision. By u-escaping Unicode can be read.

I think the cleanest solution would be to use the Properties class, and maybe XML properties ( loadFromXML ). The XML could also be held outside the application, which for internationalisation can be a usefull.

One could also in a maven build convert pre-build *.properties in UTF-8 to u-escaped *.properties. This is a maven copy with filtering.


Instead of *.properties, a PropertyResourceBundle, you could also use a ListResourceBundle, a java class containing an array of texts. The resource path in ResBundle can be slightly different wrt period/slash, but that would free one from the encoding, as you can use the IDE project encoding.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM