简体   繁体   English

如何读取像文本一样的excel(.xls)文件?

[英]How to read an excel(.xls) file like text?

I need to read an excel(.xls) file that i'm receiving. 我需要阅读我正在接收的excel(.xls)文件。 Using the regular charsets like UTF-8, Cp1252, ISO-8859-1, UTF-16LE, none of these helped me, the characters are still malformed. 使用常规字符集(如UTF-8,Cp1252,ISO-8859-1,UTF-16LE)时,这些字符都对我没有帮助,但这些字符仍然格式错误。

So i search ended up using juniversalchardet , it showed me that the charset was MacCyrillic, used MacCyrillic to read the file, but still the same weird outcome. 因此,我最终使用了juniversalchardet进行搜索,结果显示该字符集是MacCyrillic,使用MacCyrillic读取了文件,但结果仍然很奇怪。

When i open the file on excel everything is fine, all the characters are fine, since its portuguese its filled whit Ç ~ and such. 当我在excel上打开文件时,一切都很好,所有字符都很好,因为它的葡萄牙语中充满了白色〜等等。 But opening whit notepad or trough java the file is all messed up. 但是打开白纸记事本或低谷java文件都被弄乱了。 But if open the file on my excel and then save it again like .txt it becomes readable 但是,如果在我的excel上打开文件,然后再次将其保存为.txt文件,则它变得可读

My method to find the charset 我找到字符集的方法

    public static void lerCharset(String fileName) throws IOException {
    byte[] buf = new byte[50000000];
    FileInputStream fis = new FileInputStream(fileName);

    // (1)
    UniversalDetector detector = new UniversalDetector(null);

    // (2)
    int nread;
    while ((nread = fis.read(buf)) > 0 && !detector.isDone()) {
        detector.handleData(buf, 0, nread);
    }
    // (3)
    detector.dataEnd();

    // (4)
    String encoding = detector.getDetectedCharset();
    if (encoding != null) {
        System.out.println("Detected encoding = " + encoding);
    } else {
        System.out.println("No encoding detected.");
    }

    // (5)
    detector.reset();
    fis.close();
}

How can i discover the correct charset? 如何找到正确的字符集? Should i try a different aproach? 我应该尝试其他方法吗? Like making my java re-save the excel and then start reading? 像让我的Java重新保存Excel,然后开始阅读?

If I'm understanding your question, you're trying to read the excel file like a text file. 如果我了解您的问题,则您正在尝试读取Excel文件,例如文本文件。

The challenge is that .xls files are actually binary files containing the text, formatting, sheet information, macro information, etc... 挑战在于.xls文件实际上是包含文本,格式,工作表信息,宏信息等的二进制文件。

You'd either need to save the files as .csv (Either via Excel before running your program or through your program directly), upgrade them to .xlsx (which has numerous libraries that can read the file as an XML at that point) or use a library (such as apache POI or anything similar ) or even query the data out using ADO. 您可能需要将文件另存为.csv(在运行程序之前通过Excel或直接通过程序),将其升级到.xlsx(当时有许多库可以将文件读取为XML)或使用一个库(例如apache POI 或类似的东西 ),甚至使用ADO查询数据。

Good luck and I hope that's what you were implying via your question. 祝你好运,我希望这就是你通过问题暗示的意思。

Code: 码:

WorkbookSettings workbookSettings = new WorkbookSettings();
WorkbookSettings.setEncoding("Cp1252");

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM