简体   繁体   中英

Java String.startsWith() “seems” not working for the first line of a text file

I have a text file like this, and I want to parse information from the text file.

#title キミと☆Are You Ready?
#artist トライクロニカ
#mobile deresimu
#easy 0
#normal 22
#hard 27
#tag SHOW BY ROCK!!
#preset all

I used this code to parse it.

File infoFile = new File(dir, "info.txt");
//parse info.txt
String songName="?";
String artist = "?";
int difficulties[] = new int[5];

try {
    BufferedReader br = new BufferedReader(new FileReader(infoFile));
    String line = br.readLine();
    while (line != null) {
        Log.v(TAG, "line=" + line);
        //I hate BOM!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
        /*
        <a href="http://www.faqs.org/rfcs/rfc3629.html">RFC 3629 - UTF-8, a transformation format of ISO 10646</a>
        *
        * <p>The
         * <a href="http://www.unicode.org/unicode/faq/utf_bom.html">Unicode FAQ</a>
    * defines 5 types of BOMs:<ul>
        * <li><pre>00 00 FE FF  = UTF-32, big-endian</pre></li>
        * <li><pre>FF FE 00 00  = UTF-32, little-endian</pre></li>
         * <li><pre>FE FF        = UTF-16, big-endian</pre></li>
        * <li><pre>FF FE        = UTF-16, little-endian</pre></li>
         * <li><pre>EF BB BF     = UTF-8</pre></li>
        * </ul></p>
        *
        * https://stackoverflow.com/questions/1835430/byte-order-mark-screws-up-file-reading-in-java
         */
        line=line.replace("\u00EF\u00BB\u00BF", "");
        line=line.replace("\u0000 \u0000 \u00FE \u00FF","");
        line=line.replace("\u00FF \u00FE \u0000 \u0000","");
        line=line.replace("\u00FE \u00FF","");
        line=line.replace("\u00FF \u00FE","");
        if (line.startsWith("#title")) {
            Log.v(TAG, "startswith");
            line = line.replace("#title ", "").trim();
            songName = line;
        } else if (line.startsWith("#artist")) {
            line = line.replace("#artist ", "").trim();
            artist = line;
        } else if (line.startsWith("#easy")) {
            difficulties[0] = Integer.parseInt(line.replace("#easy ", "").trim());

        } else if (line.startsWith("#normal")) {
            difficulties[1] = Integer.parseInt(line.replace("#normal ", "").trim());

        } else if (line.startsWith("#hard")) {
            difficulties[2] = Integer.parseInt(line.replace("#hard ", "").trim());
        } else if (line.startsWith("#master")) {
            difficulties[3] = Integer.parseInt(line.replace("#master ", "").trim());
        } else if (line.startsWith("#apex")) {
            difficulties[4] = Integer.parseInt(line.replace("#apex ", "").trim());
            continue;
        }
        line = br.readLine();
    }
} catch (IOException | NumberFormatException e) {
    throw new RuntimeException(e);
}
//info.txt parse done.
Log.v(TAG, "Info.txt parse done.");
Log.v(TAG, "Song name=" + songName);
Log.v(TAG, "Difficulties=" + Arrays.toString(difficulties));
Log.v(TAG, "Artist=" + artist);
Log.v(TAG, "Folder=" + dir.getName());

Parsing all the other lines is OK, except for the first line. if (line.startsWith("#title")) { seems never be true to the given text file. When I changed startsWith to contains , it works.

Firstly I thought that it was a BOM problem, so I added the 5 lines removing BOM sequences. However it didn't work. The variable songName is always "?" when I use startsWith for the first line.

Any clues why this code cannot match the #title ? Thanks.

Logcat output:

2019-03-10 23:00:22.872 23600-23600/sma.rhythmtapper V/NoteFile: line=#title キミと☆Are You Ready?
2019-03-10 23:00:22.872 23600-23600/sma.rhythmtapper V/NoteFile: line=#artist トライクロニカ
2019-03-10 23:00:22.872 23600-23600/sma.rhythmtapper V/NoteFile: line=#mobile deresimu
2019-03-10 23:00:22.873 23600-23600/sma.rhythmtapper V/NoteFile: line=#easy 0
2019-03-10 23:00:22.873 23600-23600/sma.rhythmtapper V/NoteFile: line=#normal 22
2019-03-10 23:00:22.873 23600-23600/sma.rhythmtapper V/NoteFile: line=#hard 27
2019-03-10 23:00:22.874 23600-23600/sma.rhythmtapper V/NoteFile: line=#tag SHOW BY ROCK!!
2019-03-10 23:00:22.876 23600-23600/sma.rhythmtapper V/NoteFile: line=#preset all
2019-03-10 23:00:22.876 23600-23600/sma.rhythmtapper V/NoteFile: Info.txt parse done.
2019-03-10 23:00:22.876 23600-23600/sma.rhythmtapper V/NoteFile: Song name=?
2019-03-10 23:00:22.877 23600-23600/sma.rhythmtapper V/NoteFile: Difficulties=[0, 22, 27, 0, 0]
2019-03-10 23:00:22.877 23600-23600/sma.rhythmtapper V/NoteFile: Artist=トライクロニカ
2019-03-10 23:00:22.877 23600-23600/sma.rhythmtapper V/NoteFile: Folder=キミと☆Are You Ready?

EDIT

I located the problem by printing the byte sequence to logcat. It said:

"#title キミと☆Are You Ready?" -> [-17, -69, -65, 35, 116, 105, 116, 108, 101, 32, -29, -126, -83, -29, -125, -97, -29, -127, -88, -30, -104, -122, 65, 114, 101, 32, 89, 111, 117, 32, 82, 101, 97, 100, 121, -17, -68, -97]

"#title" -> [35, 116, 105, 116, 108, 101]

So I need to remove -17, -69, -65 from the line variable. How can I achieve the goal without using an external library?

The suspicion that BOM caused the problem was true.

Plus, I changed the BOM removing code to this:

line=line.replace("\uEFBB\u00BF", "");
line=line.replace("\u0000\uFEFF","");
line=line.replace("\uFFFE\u0000","");
line=line.replace("\uFEFF","");
line=line.replace("\uFFFE","");

Be careful for

  • the whitespace
  • \ï != byte 0xEF

Thank you everybody who tried to help me, and hope that others who may have the same issue get help from this post.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM