White space as first character reading file java

Question

I got a config.txt at work which looks like this:

#test
@Email1
Vorname;Vorname:
Nachname;Nachname:
Anrede;Anrede:
Titel;Titel:
Firma;Firma:
Abteilung;Abteilung:
EMail;E-Mail:
Strasse;Strasse:
PLZ;PLZ:
Ort;Ort:
Land;Land:
Telefon;Telefon:
Fax;Fax:
Bemerkung;Bemerkung:
Stichwort1;Stichwort1:

@Email2
#Format: sqlSpaltenname;EmailFeldName
Suchfeld2;Suchfeld2:
Firma;FIRMA1:
Abteilung;ABTEILUNG:
Anrede;ANREDE:
Nachname;NAME:
Vorname;VORNAME:
Strasse;STRASSE:
PLZ;PLZ:
Ort;ORT:
Land;LAND:
Telefon;TELEFON:
EMail;EMAIL:
Stichwort1;STICHWORT1:
Stichwort2;STIcHWORT2:

when reading in the file with the following code:

public Config createConfig(String filename){
        config = new Config();
        String contentType = "";
        String[] temp;
        File fileDir = new File(filename);
        int counter = 0;
        specialConfigs = new ArrayList<String>(Arrays.asList("Betreff", "Sender", "Type", "Startbalken", "Endbalken"));

            try {
                BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(fileDir), "UTF8"));
                String line;
                while ((line = reader.readLine()) != null) {
                    counter++;
                    if(line.startsWith("#")){//auskommentiert
                        continue;
                    }
                    if(line.trim().length()==0){//leer
                        continue;
                    }
                    if(line.startsWith("@")){//Abschnittswechsel, @Email1, @Email2, @Einstellungen
                        contentType = line;
                        continue;
                    }

                    temp = line.split(";");
                    if(temp.length != 2){
                        if(specialConfigs.contains(temp[0]) && contentType.equals("@Email_Eigenschaften")){
                            parseSpecialContent(line);
                        } else {
                        Main.logger.warning("Fehler in der Konfigurationsdatei in Zeile: "+counter+"\nProgramm wird abgebrochen");
                        reader.close();
                        System.exit(0);
                        }
                    } else {
                        if(contentType.equals("@Email1")){
                            config.addEmailField(1, temp[0], temp[1]);
                        } else if(contentType.equals("@Email2")){
                            config.addEmailField(2, temp[0], temp[1]);
                        } else {
                            config.setParameter(temp[0], temp[1]);
                        }
                    }

                }
                reader.close();

            } catch (FileNotFoundException e) {
                Main.logger.severe("Fehler beim Einlesen der Konfigurationsdatei: "+filename+" Datei nicht gefunden."+"\nProgramm wird abgebrochen");
                e.printStackTrace();
                System.exit(0);
            } catch (IOException e) {
                Main.logger.severe("Fehler beim Einlesen der Konfigurationsdatei: "+filename+" Datei kann nicht gelesen werden."+"\nProgramm wird abgebrochen");
                e.printStackTrace();
                System.exit(0);
            }

        return config;
    }

I always got an error, because the first line #test when reading it in is always in Memory as: " #test" it get's a leading whitespace. I tried removing the first line rewriting the whole file again. But no matter what I have changed it always reads in as " #test". While being in debug mode I manually changed the value of the variable line to the correct value #test and everything worked perfectly. The program ran fine earlier aswell. The File just got modified to contain the value EMail instead of the earlier version Email . The earlier version still works... Can anyone help?

Answer 1

It probably is a UTF-8 BOM (Begin of File Marker), \ , a zero-width space, used as marker for detecting some Unicode format: UTF-8, UTF-16LE, UTF-16BE and others.

It is redundant (not needed), but allows Windows Notepad to distinguish local encoding and UTF-8. Maybe the file was made with Notepad and saved as Unicode (with BOM).

A solution would be

            while ((line = reader.readLine()) != null) {
                line = line.replaceFirst("^\uFEFF", "");

That does a bit too much, as assumedly only the first line is concerned.

Answer 2

String#trim() only removes all characters equal to or less than the space character ( \ ), but there are many more characters that are "whitespace".

Instead of using trim() , use regex to remove all Graph (ie non-printable) characters at the start of end of the string:

line = line.replaceAll("^\\P{Graph}+|\\P{Graph}+$", "");

White space as first character reading file java

Question

2 answers

solution1
1 2018-04-03 12:48:38

solution2
0 2018-04-03 13:10:22

White space as first character reading file java

Question

2 answers

solution1 1 2018-04-03 12:48:38

solution2 0 2018-04-03 13:10:22

solution1
1 2018-04-03 12:48:38

solution2
0 2018-04-03 13:10:22