Java文件尾n行的Java代碼等效於Unix中的tail

Question

以下是為文件末尾添加n行尾編寫的代碼。

 <code>

import java.io.RandomAccessFile;
import java.util.HashMap;
import java.util.Map;

class TailCommand {
public static void main(String args[]) {
    int j;
    try {
        /*
         * Receive file name and no of lines to tail as command line
         * argument
         */
        RandomAccessFile randomFile = new RandomAccessFile(args[0], "r");
        long numberOfLines = Long.valueOf(args[1]).longValue();
        long lineno = 0;
        String str;
        String outstr;
        StringBuilder sb = new StringBuilder();
        Map<Long, String> strmap = new HashMap<Long, String>();
        while ((str = randomFile.readLine()) != null) {
            strmap.put(lineno + 1, str);
            lineno++;
        }
        System.out.println("Total no of lines in file is " + lineno);
        long startPosition = lineno - numberOfLines;
        while (startPosition <= lineno) {
            if (strmap.containsKey(startPosition)) {
            // System.out.println("HashMap contains "+  startPosition
                // +" as key");
                outstr = (String) strmap.get(startPosition);
                sb.append(outstr);
                System.out.println(outstr);
            }
            startPosition++;
        }
        // Collection coll = strmap.values();
        // System.out.println(coll+"size"+strmap.size());
        // System.out.println(sb);
    } catch (Exception e) {
        e.printStackTrace();
    }
}
}

我使用了以下方法：將File和要尾隨的行數作為命令行參數接收

使用readLine方法獲取文件中的總行數
為每個readLine調用使用增量器
將此增量器和readLinemethod返回的字符串存儲在哈希映射中
結果，整個文件存儲在哈希映射中
現在，您可以使用哈希映射鍵從特定的行中檢索文件的值。
您可以使用stringbuilder來打印特定行中的選擇

我的疑惑

我的方法有效嗎？我可以對大於10MB的大文件使用這種方法嗎？ 如果更多的人必須同時從同一文件中拖尾，我需要做哪些改進？ 我也可以將StringBuilder用於更大的文件嗎？

Answer 1

如我對djna的回答的評論中所述，您的執行效率不是很高：

您正在讀取整個文件。 如果文件很大而n行很小，那么您就是在浪費時間，I / O以及您所擁有的。
另外，您在浪費內存。
沒有緩沖（ 除了RandomAccessFile#readLine() may or may not provide ），這還會導致某些速度變慢。

因此，我要做的就是從頭開始以塊的形式向后讀取文件，並分別處理這些塊。

RandomAccessFile raf = new RandomAccessFile(new File(file), "r");
List<String> lines = new ArrayList<String>();

final int chunkSize = 1024 * 32;
long end = raf.length();
boolean readMore = true;
while (readMore) {
    byte[] buf = new byte[chunkSize];

    // Read a chunk from the end of the file
    long startPoint = end - chunkSize;
    long readLen = chunkSize;
    if (startPoint < 0) {
        readLen = chunkSize + startPoint;
        startPoint = 0;
    }
    raf.seek(startPoint);
    readLen = raf.read(buf, 0, (int)readLen);
    if (readLen <= 0) {
        break;
    }

    // Parse newlines and add them to an array
    int unparsedSize = (int)readLen;
    int index = unparsedSize - 1;
    while (index >= 0) {
        if (buf[index] == '\n') {
            int startOfLine = index + 1;
            int len = (unparsedSize - startOfLine);
            if (len > 0) {
                lines.add(new String(buf, startOfLine, len));
            }
            unparsedSize = index + 1;
        }
        --index;
    }

    // Move end point back by the number of lines we parsed
    // Note: We have not parsed the first line in the chunked
    // content because could be a partial line
    end = end - (chunkSize - unparsedSize);

    readMore = lines.size() < linesToRead && startPoint != 0;
}

// Only print the requested number of lines
if (linesToRead > lines.size()) {
    linesToRead = lines.size();
}

for (int i = linesToRead - 1; i >= 0; --i) {
    pw.print(lines.get(i));
}

Answer 2

我的方法有效嗎，我可以對大於10MB的大文件使用這種方法嗎？

是的，它是有效的。 是的，您可以“將”它用於較大的文件，但是由於您始終在掃描整個文件，因此文件獲取的時間越長，性能就會下降。 同樣，由於將整個內容存儲在內存中，因此內存需求將一直增加到一個非常大的文件開始引起OutOfMemoryError問題的程度。

如果更多的人必須同時從同一文件中拖尾，我需要做哪些改進？

無，因為您僅尾隨n行。 每個人都可以簡單地運行他們自己的程序實例。 如果您希望隨着時間的推移進行更新而關注該文件（例如，如果省略-n參數，則tail執行什么操作），則必須進行一些更改。

我也可以將StringBuilder用於更大的文件嗎？

當然可以，但是我不清楚您會得到什么。

我個人建議按以下方式重組您的算法：

搜索到文件末尾。
向后解析，直到遇到所需數量的\\n字符。
向前讀到文件末尾，隨手打印。

這樣就無需緩沖文件中的每一行，而且對於很大的文件大小也不會降低性能。

Answer 3

似乎您要將整個文件保留在內存中，只需要保留“ n”行即可。 因此，分配一個大小為n的數組，將其用作環形緩沖區。

在您顯示的代碼中，您似乎沒有使用StringBuilder，我想您正在使用它來構建輸出。 因為那應該僅取決於n，而不取決於文件的大小，我看不出為什么使用StringBuilder應該是一個問題。

Answer 4

您基本上可以讀取內存中的整個文件-實際上，您不需要隨機訪問文件。

如果文件很大，那可能不是最佳選擇。

為什么不使用HashMap來存儲（行號，文件中的位置）而不是（行號->行）。 這樣，您將知道要尋找最后n行的位置。

另一種方法是使用n個字符串的緩沖區（數組）-到目前為止的最后n行。 但是要小心，在讀取新行時，您不想移動緩沖區中的所有元素（即1-> 0、2-> 1，...，n->（n-1），然后添加末尾的新行）。 請改用循環緩沖區。 （將索引保留在緩沖區中的最后位置，並在添加新行時覆蓋下一個位置。如果您位於位置n-1，則下一個為0-這樣循環）。

Answer 5

我已根據上述建議修改了代碼：請參閱下面提到的更新代碼：

所使用的邏輯如下所述：

1.使用文件長度查找EOF文件
2.將文件指針從EOF向后移，並檢查是否出現'\\ n'。
3.如果發現'\\ n'出現，增加行計數器並將readline的輸出放入hashMap
4.按降序從hashMap檢索值。 我希望上述方法不會引起內存問題，這一點很明顯。 請提出建議。

                                                                                    import java.io.RandomAccessFile;
   import java.util.HashMap;
   import java.util.Map;

   class NewTailCommand {
    public static void main(String args[]) {
    Map<Long, String> strmap = new HashMap<Long, String>();
    long numberOfLines = Long.valueOf(args[1]).longValue();
    try {
        /*
         * Receive file name and no of lines to tail as command line
         * argument
         */
        RandomAccessFile randomFile = new RandomAccessFile(args[0], "r");

        long filelength = randomFile.length();
        long filepos = filelength - 1;
        long linescovered = 1;
        System.out.println(filepos);
        for (linescovered = 1; linescovered <= numberOfLines; filepos--) {
            randomFile.seek(filepos);
            if (randomFile.readByte() == 0xA)
                if (filepos == filelength - 1)
                    continue;
                else {
                         strmap.put(linescovered,randomFile.readLine());
                    linescovered++;
                }

        }
    } catch (Exception e) {
        e.printStackTrace();
    }
    long startPosition = numberOfLines;
    while (startPosition != 0) {
        if (strmap.containsKey(startPosition)) {
            // System.out.println("HashMap contains "+ startPosition
            // +" as key");
            String outstr = (String) strmap.get(startPosition);
            System.out.println(outstr);
            startPosition--;

        }
    }
}
}

Java文件尾n行的Java代碼等效於Unix中的tail

問題描述

5 個解決方案

解決方案1
5 2011-07-31 07:12:14

解決方案2
3 2011-07-31 07:06:07

解決方案3
0 2011-07-31 06:56:08

解決方案4
0 2011-07-31 06:58:26

解決方案5
0 2011-08-15 06:59:50

Java文件尾n行的Java代碼等效於Unix中的tail

問題描述

5 個解決方案

解決方案1 5 2011-07-31 07:12:14

解決方案2 3 2011-07-31 07:06:07

解決方案3 0 2011-07-31 06:56:08

解決方案4 0 2011-07-31 06:58:26

解決方案5 0 2011-08-15 06:59:50

解決方案1
5 2011-07-31 07:12:14

解決方案2
3 2011-07-31 07:06:07

解決方案3
0 2011-07-31 06:56:08

解決方案4
0 2011-07-31 06:58:26

解決方案5
0 2011-08-15 06:59:50