简体   繁体   English

在HTTP Servlet中正确地传输输入和输出

[英]Properly streaming of both input and output in HTTP Servlet

I'm trying to write servlet that will handle POST request and stream both input and output. 我正在尝试编写将处理POST请求并传输输入和输出的servlet。 I mean it should read one line of input, do some work on this line, and write one line of output. 我的意思是它应该读取一行输入,在这一行上做一些工作,并写一行输出。 And it should be able to handle arbitrary long request (so also will produce arbitrary long response) without out of memory exceptions. 它应该能够处理任意长的请求(因此也会产生任意长响应),而不会出现内存不足的异常。 Here's my first attempt: 这是我的第一次尝试:

protected void doPost(HttpServletRequest request, HttpServletResponse response) {
    ServletInputStream input = request.getInputStream();
    ServletOutputStream output = response.getOutputStream();

    LineIterator lineIt = lineIterator(input, "UTF-8");
    while (lineIt.hasNext()) {
        String line = lineIt.next();
        output.println(line.length());
    }
    output.flush();
}

Now I tested this servlet using curl and it works, but when I've written client using Apache HttpClient both the client thread and server thread hangs. 现在我使用curl测试了这个servlet并且它可以工作,但是当我使用Apache HttpClient编写客户端时,客户端线程和服务器线程都会挂起。 The client looks like that: 客户端看起来像这样:

HttpClient client = HttpClientBuilder.create().build();
HttpPost post = new HttpPost(...);

// request
post.setEntity(new FileEntity(new File("some-huge-file.txt")));
HttpResponse response = client.execute(post);

// response
copyInputStreamToFile(response.getEntity().getContent(), new File("results.txt"));

The issue is obvious. 问题很明显。 Client does it's job sequentially in one thread - first it sends request completely, and only then starts reading the response. 客户端在一个线程中按顺序执行它的工作 - 首先它完全发送请求,然后才开始读取响应。 But server for every line of input writes one line of output, if client is not reading output (and sequential client isn't) then server is blocked trying to write to the output stream. 但是,如果客户端没有读取输出(并且顺序客户端不是),则每行输入的服务器写入一行输出,然后服务器被阻止尝试写入输出流。 This in turn blocks client trying to send the input to the server. 这反过来会阻止客户端尝试将输入发送到服务器。

I suppose curl works, because it somehow sends input and receives output concurrently (in separate threads?). 我认为curl有效,因为它以某种方式发送输入并同时接收输出(在单独的线程中?)。 So the first question is can Apache HttpClient be configured to behave similarly as curl ? 所以第一个问题是Apache HttpClient可以配置为与curl类似吗?

The next question is, how to improve the servlet so ill-behaving client won't cause server threads to be hanging? 接下来的问题是,如何改进servlet,这样不良的客户端不会导致服务器线程挂起? My first attempt is to introduce intermediate buffer, that will gather the output until the client finishes sending input, and only then servlet will start sending output: 我的第一次尝试是引入中间缓冲区,它将收集输出直到客户端完成发送输入,然后servlet才会开始发送输出:

ServletInputStream input = request.getInputStream();
ServletOutputStream output = response.getOutputStream();

// prepare intermediate store
int threshold = 100 * 1024; // 100 kB before switching to file store
File file = File.createTempFile("intermediate", "");
DeferredFileOutputStream intermediate = new DeferredFileOutputStream(threshold, file);

// process request to intermediate store
PrintStream intermediateFront = new PrintStream(new BufferedOutputStream(intermediate));
LineIterator lineIt = lineIterator(input, "UTF-8");
while (lineIt.hasNext()) {
    String line = lineIt.next();
    intermediateFront.println(line.length());
}
intermediateFront.close();

// request fully processed, so now it's time to send response
intermediate.writeTo(output);

file.delete();

This works, and ill-behaving client can use my servlet safely, but on the other hand for these concurrent clients like curl this solution adds an unnecessary latency. 这样可行,不良行为的客户端可以安全地使用我的servlet,但另一方面,对于像curl这样的并发客户端,这个解决方案会增加不必要的延迟。 The parallel client is reading the response in separate thread, so it will benefit when the response will be produced line by line as the request is consumed. 并行客户端正在单独的线程中读取响应,因此当请求被消耗时,响应将逐行生成时,它将受益。

So I think I need a byte buffer/queue that: 所以我想我需要一个字节缓冲区/队列:

  • can be written by one thread, and read by another thread 可以由一个线程写入,并由另一个线程读取
  • will initially be only in memory 最初只会在记忆中
  • will overflow to disk if necessary (similarly as DeferredFileOutputStream ). 必要时将溢出到磁盘(与DeferredFileOutputStream类似)。

In the servlet I'll spawn new thread to read input, process it, and write output to the buffer, and the main servlet thread will read from this buffer and send it to the client. 在servlet中,我将生成新线程来读取输入,处理它,并将输出写入缓冲区,主servlet线程将从此缓冲区读取并将其发送到客户端。

Do you know any library like to do that? 你知道任何图书馆都喜欢那样做吗? Or maybe my assumptions are wrong and I should do something completely different... 或许我的假设是错误的,我应该做一些完全不同的事情......

To achieve simultaneously writing and reading you can use Jetty HttpClient http://www.eclipse.org/jetty/documentation/current/http-client-api.html 要实现同时写作和阅读,您可以使用Jetty HttpClient http://www.eclipse.org/jetty/documentation/current/http-client-api.html

I've created pull request to your repo with this code. 我已使用此代码为您的仓库创建了拉取请求。

HttpClient httpClient = new HttpClient();
httpClient.start();

Request request = httpClient.newRequest("http://localhost:8080/line-lengths");
final OutputStreamContentProvider contentProvider = new OutputStreamContentProvider();
InputStreamResponseListener responseListener = new InputStreamResponseListener();

request.content(contentProvider).method(HttpMethod.POST).send(responseListener); //async request
httpClient.getExecutor().execute(new Runnable() {
    public void run() {
        try (OutputStream outputStream = contentProvider.getOutputStream()) {
            writeRequestBodyTo(outputStream); //writing to stream in another thread
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
});

readResponseBodyFrom(responseListener.getInputStream()); //reading response
httpClient.stop();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM