简体   繁体   中英

How to read an excel(.xls) file like text?

I need to read an excel(.xls) file that i'm receiving. Using the regular charsets like UTF-8, Cp1252, ISO-8859-1, UTF-16LE, none of these helped me, the characters are still malformed.

So i search ended up using juniversalchardet , it showed me that the charset was MacCyrillic, used MacCyrillic to read the file, but still the same weird outcome.

When i open the file on excel everything is fine, all the characters are fine, since its portuguese its filled whit Ç ~ and such. But opening whit notepad or trough java the file is all messed up. But if open the file on my excel and then save it again like .txt it becomes readable

My method to find the charset

    public static void lerCharset(String fileName) throws IOException {
    byte[] buf = new byte[50000000];
    FileInputStream fis = new FileInputStream(fileName);

    // (1)
    UniversalDetector detector = new UniversalDetector(null);

    // (2)
    int nread;
    while ((nread = fis.read(buf)) > 0 && !detector.isDone()) {
        detector.handleData(buf, 0, nread);
    }
    // (3)
    detector.dataEnd();

    // (4)
    String encoding = detector.getDetectedCharset();
    if (encoding != null) {
        System.out.println("Detected encoding = " + encoding);
    } else {
        System.out.println("No encoding detected.");
    }

    // (5)
    detector.reset();
    fis.close();
}

How can i discover the correct charset? Should i try a different aproach? Like making my java re-save the excel and then start reading?

If I'm understanding your question, you're trying to read the excel file like a text file.

The challenge is that .xls files are actually binary files containing the text, formatting, sheet information, macro information, etc...

You'd either need to save the files as .csv (Either via Excel before running your program or through your program directly), upgrade them to .xlsx (which has numerous libraries that can read the file as an XML at that point) or use a library (such as apache POI or anything similar ) or even query the data out using ADO.

Good luck and I hope that's what you were implying via your question.

Code:

WorkbookSettings workbookSettings = new WorkbookSettings();
WorkbookSettings.setEncoding("Cp1252");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM