简体   繁体   中英

Why is this “line count” program slow in Java? Using MappedByteBuffer

To try MappedByteBuffer (memory mapped file in Java), I wrote a simple wc -l (text file line count) demo:

int wordCount(String fileName) throws IOException {
    FileChannel fc = new RandomAccessFile(new File(fileName), "r").getChannel();
    MappedByteBuffer mem = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size());

    int nlines = 0;
    byte newline = '\n';

    for(long i = 0; i < fc.size(); i++) {
        if(mem.get() == newline)
            nlines += 1;
    }

    return nlines;
}

I tried this on a file of about 15 MB (15008641 bytes), and 100k lines. On my laptop, it takes about 13.8 sec . Why is it so slow?

Complete class code is here: http://pastebin.com/t8PLRGMa

For the reference, I wrote the same idea in C: http://pastebin.com/hXnDvZm6

It runs in about 28 ms, or 490 times faster .

Out of curiosity, I also wrote a Scala version using essentially the same algorithm and APIs as in Java. It runs 10 times faster , which suggests there is definitely something odd going on.

Update : The file is cached by the OS, so there is no disk loading time involved.

I wanted to use memory mapping for random access to bigger files which may not fit into RAM. That is why I am not just using a BufferedReader.

The code is very slow, because fc.size() is called in the loop.

JVM obviously cannot eliminate fc.size() , since file size can be changed in run-time. Querying file size is relatively slow, because it requires a system call to the underlying file system.

Change this to

    long size = fc.size();
    for (long i = 0; i < size; i++) {
        ...
    }

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM