使用apache httpclient增量处理twitter的流api？

Question

我正在使用Apache HTTPClient 4连接到具有默认级别访问权限的twitter的流api。 它在一开始就可以很好地工作，但是在检索数据几分钟后，它因以下错误而失败：

2012-03-28 16:17:00,040 DEBUG org.apache.http.impl.conn.SingleClientConnManager: Get connection for route HttpRoute[{tls}->http://myproxy:80->https://stream.twitter.com:443]
2012-03-28 16:17:00,040 WARN com.cloudera.flume.core.connector.DirectDriver: Exception in source: TestTwitterSource
java.lang.IllegalStateException: Invalid use of SingleClientConnManager: connection still allocated.
    at org.apache.http.impl.conn.SingleClientConnManager.getConnection(SingleClientConnManager.java:216)
Make sure to release the connection before allocating another one.
    at org.apache.http.impl.conn.SingleClientConnManager$1.getConnection(SingleClientConnManager.java:190)

我理解为什么我要面对这个问题。 我试图在水槽群集中将此HttpClient用作水槽源。 代码如下：

public Event next() throws IOException, InterruptedException {

    try {

        HttpHost target = new HttpHost("stream.twitter.com", 443, "https");
        new BasicHttpContext();
        HttpPost httpPost = new HttpPost("/1/statuses/filter.json");
        StringEntity postEntity = new StringEntity("track=birthday",
                "UTF-8");
        postEntity.setContentType("application/x-www-form-urlencoded");
        httpPost.setEntity(postEntity);
        HttpResponse response = httpClient.execute(target, httpPost,
                new BasicHttpContext());
        BufferedReader reader = new BufferedReader(new InputStreamReader(
                response.getEntity().getContent()));
        String line = null;
        StringBuffer buffer = new StringBuffer();
        while ((line = reader.readLine()) != null) {
            buffer.append(line);
            if(buffer.length()>30000) break;
        }
        return new EventImpl(buffer.toString().getBytes());
    } catch (IOException ie) {
        throw ie;
    }

}

我试图将响应流中的30,000个字符缓冲到StringBuffer中，然后将其作为收到的数据返回。 我显然不是在关闭连接-但我想我还不想关闭它。 Twitter的开发人员指南在此处讨论此内容，内容为：

某些HTTP客户端库仅在服务器关闭连接后才返回响应主体。 这些客户端将无法访问流API。 您必须使用将逐步返回响应数据的HTTP客户端。 最强大的HTTP客户端库将提供此功能。 例如，Apache HttpClient将处理此用例。

它清楚地告诉您HttpClient将递增地返回响应数据。 我已经阅读了示例和教程，但没有发现任何与之接近的东西。 如果你们已经使用了httpclient（如果不是Apache）并逐步阅读了twitter的流式API，请告诉我您是如何实现这一壮举的。 那些没有的人，请随时为答案做出贡献。 TIA。

更新

我尝试这样做：1）我将获取流句柄移到了水槽源的open方法中。 2）使用简单的输入流并将数据读入字节缓冲区。 所以这是方法主体现在的样子：

        byte[] buffer = new byte[30000];

        while (true) {
            int count = instream.read(buffer);
            if (count == -1)
                continue;
            else
                break;
        }
        return new EventImpl(buffer);

这在一定程度上有效-我得到了推文，它们很好地被写入了目的地。 问题在于instream.read（buffer）返回值。 即使流上没有数据，并且缓冲区具有默认的\\ u0000字节（其中有30,000个字节），所以此值也将写入目标。 因此目标文件如下所示：“ tweets..tweets..tweeets .. \\ u0000 \\ u0000 \\ u0000 \\ u0000 \\ u0000 \\ u0000 \\ u0000 \\ u0000 ... tweets..tweets ...”。 我知道计数不会返回-1 coz，这是一个永无止境的流，那么如何从read命令中找出缓冲区是否包含新内容？

Answer 1

问题是您的代码正在泄漏连接。 请确保无论关闭内容流还是中止请求。

    InputStream instream = response.getEntity().getContent();
    try {
        BufferedReader reader = new BufferedReader(
               new InputStreamReader(instream));
        String line = null;
        StringBuffer buffer = new StringBuffer();
        while ((line = reader.readLine()) != null) {
            buffer.append(line);
            if (buffer.length()>30000) {
               httpPost.abort();
               // connection will not be re-used
               break;
            }
        }
        return new EventImpl(buffer.toString().getBytes());
    } finally {
        // if request is not aborted the connection can be re-used
        try {
          instream.close();
        } catch (IOException ex) {
          // log or ignore
        }
    }

Answer 2

事实证明，这是一个水槽问题。 Flume经过优化，可以传输大小为32kb的事件。 超出32kb的任何内容，Flume都会解决。 （解决方法是将事件大小调整为大于32KB）。 因此，我已更改代码以至少缓冲20,000个字符。 这是可行的，但并非万无一失。 如果缓冲区长度超过32kb，这仍然可能失败，但是，到目前为止，它在一个小时的测试中并未失败-我认为这与Twitter不会在其公共流上发送大量数据这一事实有关。

while ((line = reader.readLine()) != null) {
            buffer.append(line);
            if(buffer.length()>20000) break;
        }

使用apache httpclient增量处理twitter的流api？

问题描述

2 个解决方案

解决方案1
0 2012-03-28 19:56:19

解决方案2
0 已采纳 2012-04-01 13:25:48

使用apache httpclient增量处理twitter的流api？

问题描述

2 个解决方案

解决方案1 0 2012-03-28 19:56:19

解决方案2 0 已采纳 2012-04-01 13:25:48

解决方案1
0 2012-03-28 19:56:19

解决方案2
0 已采纳 2012-04-01 13:25:48