简体   繁体   中英

How to read a huge HTML file in Java?

I have a requirement where a huge HTML file must be read and displayed in the front-end of my application. The HTML file size is around 25MB. Tried several options like:

Option 1:
    try (Scanner scnr = new Scanner(file);) {
                while (scnr.hasNextLine()) {
                    String line= scnr.nextLine();
                }
    } 
Option 2:
    FileUtils.readFileToString(file, "UTF-8");
Option 3:
    IOUtils.toString(new FileInputStream(new File(file)), "UTF-8")

All the above 3 options are failing to read the file. I see no error. The processing just stops and the webpage throws an "error" popup with no info.

Problem seems to be that the entire HTML file content is read as a single line of string.

Is there a way in which I can read this file?

I went through several other questions here to see if there is a possible solution, but nothing seems to be working for this case.

@user811433, I did some testing with Apache Commons IO reading a log file with size around 800MB and no error occurred in the execution.

This method opens an InputStream for the file. When you have finished with the iterator you should close the stream to free internal resources. This can be done by calling the LineIterator.close() or LineIterator.closeQuietly(LineIterator) method.

In case you process line by line like a Stream, The recommended usage pattern is something like this:

File file = new File("C:\\Users\\lucas\\Desktop\\file-with-800MB.log");

    LineIterator it = FileUtils.lineIterator(file, "UTF-8");
    try {           
        while (it.hasNext()) {
            String line = it.nextLine();
            // do something with line, here just sysout...
            System.out.println( line );
        }
    } finally {
        LineIterator.closeQuietly(it);
    }

Some extra references, here and here

try {
            File f=new File("test.html");
            BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(f)));
            String content=null;

            while((content=reader.readLine())!=null)
            {
                  System.out.println(content);
            }

        } catch (Exception e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM