简体   繁体   English

Java中BufferedReader.readLine()的最大行长度?

[英]Maximum line length for BufferedReader.readLine() in Java?

I use BufferedReader's readLine() method to read lines of text from a socket. 我使用BufferedReader的readLine()方法从套接字读取文本行。

There is no obvious way to limit the length of the line read. 没有明显的方法来限制读取行的长度。

I am worried that the source of the data can (maliciously or by mistake) write a lot of data without any line feed character, and this will cause BufferedReader to allocate an unbounded amount of memory. 我担心数据源可能(恶意地或错误地)写入大量数据而没有任何换行符,这将导致BufferedReader分配无限量的内存。

Is there a way to avoid that? 有没有办法避免这种情况? Or do I have to implement a bounded version of readLine() myself? 或者我是否必须自己实现readLine()的有界版本?

The simplest way to do this will be to implement your own bounded line reader. 最简单的方法是实现自己的有界线读取器。

Or even simpler, reuse the code from this BoundedBufferedReader class . 甚至更简单,重用BoundedBufferedReader类中的代码。

Actually, coding a readLine() that works the same as the standard method is not trivial. 实际上,编写与标准方法相同的readLine()编码并readLine() Dealing with the 3 kinds of line terminator CORRECTLY requires some pretty careful coding. 处理3种线路终结器CORRECTLY需要一些非常仔细的编码。 It is interesting to compare the different approaches of the above link with the Sun version and Apache Harmony version of BufferedReader. 将上述链接的不同方法与Sun版本Apache Harmony版本的BufferedReader进行比较是很有趣的。

Note: I'm not entirely convinced that either the bounded version or the Apache version is 100% correct. 注意:我并不完全相信有界版本或Apache版本是100%正确的。 The bounded version assumes that the underlying stream supports mark and reset, which is certainly not always true. 有界版本假定底层流支持标记和重置,这当然不总是正确的。 The Apache version appears to read-ahead one character if it sees a CR as the last character in the buffer. 如果将Apache视为缓冲区中的最后一个字符,则Apache版本似乎预读一个字符。 This would break on MacOS when reading input typed by the user. 当读取用户输入的输入时,这将在MacOS上中断。 The Sun version handles this by setting a flag to cause the possible LF after the CR to be skipped on the next read... operation; Sun版本通过设置一个标志来处理这个问题,以便在下一次read...操作时跳过CR之后导致可能的LF; ie no spurious read-ahead. 即没有虚假的预读。

Another option is Apache Commons' BoundedInputStream : 另一个选择是Apache Commons的BoundedInputStream

InputStream bounded = new BoundedInputStream(is, MAX_BYTE_COUNT);
BufferedReader reader = new BufferedReader(new InputStreamReader(bounded));
String line = reader.readLine();

Perhaps the easiest solution is to take a slightly different approach. 也许最简单的解决方案是采取略微不同的方法。 Instead of attempting to prevent a DoS by limiting one particular read, limit the entire amount of raw data read. 不是通过限制一个特定读取来阻止DoS,而是限制读取的原始数据量。 In this way you don't need to worry about using special code for every single read and loop, so long as the memory allocated is proportionate to incoming data. 这样,只要分配的内存与传入数据成比例,您就不必担心每个读取和循环使用特殊代码。

You can either meter the Reader , or probably more appropriately, the undecoded Stream or equivalent. 您既可以计算Reader ,也可以更适当地计算未解码的Stream或等效的。

The limit for a String is 2 billion chars. String的限制是20亿个字符。 If you want the limit to be smaller, you need to read the data yourself. 如果您希望限制更小,则需要自己读取数据。 You can read one char at a time from the buffered stream until the limit or a new line char is reached. 您可以从缓冲流中一次读取一个字符,直到达到限制或新行char。

There are a few ways round this: 这有几种方法:

  • if the amount of data overall is very small, load data in from the socket into a buffer (byte array, bytebuffer, depending on what you prefer), then wrap the BufferedReader around the data in memory (via a ByteArrayInputStream etc); 如果整体数据量非常小,则将数据从套接字加载到缓冲区(字节数组,bytebuffer,取决于您喜欢的内容),然后将BufferedReader包装在内存中的数据周围(通过ByteArrayInputStream等);
  • just catch the OutOfMemoryError, if it occurs; 只要捕获OutOfMemoryError,如果它发生; catching this error is generally not reliable, but in the specific case of catching array allocation failures, it is basically safe (but does not solve the issue of any knock-on effect that one thread allocating large amounts from the heap could have on other threads running in your application, for example); 捕获这个错误通常是不可靠的,但是在捕获数组分配失败的特定情况下,它基本上是安全的(但是没有解决一个线程从堆中分配大量的线程可能对其他线程造成的任何连锁效应的问题例如,在你的应用程序中运行);
  • implement a wrapper InputStream that will only read so many bytes, then insert this between the socket and BufferedReader; 实现一个只读取这么多字节的包装器InputStream,然后在socket和BufferedReader之间插入它;
  • ditch BufferedReader and split your lines via the regular expressions framework (implement a CharSequence whose chars are pulled from the stream, and then define a regular expression that limits the length of lines); 沟通BufferedReader并通过正则表达式框架分割你的行(实现一个CharSequence,其字符从流中拉出,然后定义一个限制行长度的正则表达式); in principle, a CharSequence is supposed to be random access, but for a simple "line splitting" regex, in practice you will probably find that successive chars are always requested, so that you can "cheat" in your implementation. 原则上,CharSequence应该是随机访问,但对于简单的“行分割”正则表达式,实际上你可能会发现总是请求连续的字符,这样你就可以在你的实现中“作弊”。

In BufferedReader , instead of using String readLine() , use int read(char[] cbuf, int off, int len) ; BufferedReader ,不使用String readLine() ,而是使用int read(char[] cbuf, int off, int len) ; you can then use boolean ready() to see if you got it all and convert in into a string using the constructor String(byte[] bytes, int offset, int length) . 然后你可以使用boolean ready()来查看你是否全部使用构造函数String(byte[] bytes, int offset, int length)将其转换为字符串。

If you don't care about the whitespace and you just want to have a maximum number of characters per line, then the proposal Stephen suggested is really simple, 如果你不关心空白并且你只想拥有每行最多的字符数,那么Stephen建议的提议非常简单,

import java.io.BufferedReader;
import java.io.IOException;

public class BoundedReader extends BufferedReader {

    private final int  bufferSize;
    private       char buffer[];

    BoundedReader(final BufferedReader in, final int bufferSize) {
        super(in);
        this.bufferSize = bufferSize;
        this.buffer     = new char[bufferSize];
    }

    @Override
    public String readLine() throws IOException {
        int no;

        /* read up to bufferSize */
        if((no = this.read(buffer, 0, bufferSize)) == -1) return null;
        String input = new String(buffer, 0, no).trim();

        /* skip the rest */
        while(no >= bufferSize && ready()) {
            if((no = read(buffer, 0, bufferSize)) == -1) break;
        }

        return input;
    }

}

Edit: this is intended to read lines from a user terminal. 编辑:这是为了从用户终端读取行。 It blocks until the next line, and returns a bufferSize -bounded String ; 它会阻塞直到下一行,并返回一个bufferSize -bounded String ; any further input on the line is discarded. 线路上的任何进一步输入都将被丢弃。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM