简体   繁体   English

从Android App与服务器通信时出现各种HTTP错误

[英]Variety of HTTPs errors while communicating to server from Android App

UPDATE: 04 Jan 2015 更新时间:2015年1月4日

I still have these issues. 我还有这些问题。 Users of our app have increased and I see all kind of network errors. 我们的应用程序的用户增加了,我看到了所有类型的网络错误。 Our app sends out emails everytime there is a network related error on app. 每当应用程序出现网络相关错误时,我们的应用程序就会发送电子邮件。

Our app does a financial transactions - so re-submits are not really idempotent - so very scared of enabling HttpClient's retry feature. 我们的应用程序进行了金融交易 - 因此重新提交并不是真正的幂等 - 因此非常害怕启用HttpClient的重试功能。 we have done some kind of response caching on server to handle re-submits done explicitly by user. 我们在服务器上做了某种响应缓存来处理用户明确完成的重新提交。 However, still no solution that works without bad user experience. 但是,仍然没有解决方案,没有糟糕的用户体验。

Original Question 原始问题

I have an android app which posts data as part of user operation. 我有一个Android应用程序,它发布数据作为用户操作的一部分。 The data includes few images & I package them as Protobuf message (byte array, in effect) and post it to server over HTTPS connection. 数据包含少量图像,我将它们打包为Protobuf消息(实际上是字节数组),并通过HTTPS连接将其发布到服务器。

Though the app works fine for most part, but we are seeing connection errors occasionally. 虽然应用程序在大多数情况下都能正常工作,但我们偶尔会看到连接错误。 The issue has become more pronounced now that we have some users in relatively slow network areas (2G connections). 由于我们在相对较慢的网络区域(2G连接)中有一些用户,因此问题变得更加明显。 However, the issue is not limited to slow connections areas, issue is seen with customers using WiFi and 3G connections. 然而,问题不仅限于连接速度慢的区域,客户使用WiFi和3G连接也会出现问题。

Here are few exceptions we notice in our App logs 以下是我们在App日志中注意到的一些例外情况

Below one happens after 5 minutes, as I had set Socket timeout to 5 minutes. 下面的一个发生在5分钟后,因为我已将Socket超时设置为5分钟。 The app was trying to post 145kb of data in this case 该应用程序试图在这种情况下发布145kb的数据

Stack trace java.net.SocketTimeoutException: Read timed out at org.apache.harmony.xnet.provider.jsse.NativeCrypto.SSL_read(Native Method) at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl$SSLInputStream.read(OpenSSLSocketImpl.java:662) at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:103) at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:191) 堆栈跟踪java.net.SocketTimeoutException:在org.apache.harmony.xnet.provider.jsse.NativeCrypto.SSL_read(本机方法)中读取超时时间,位于org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl $ SSLInputStream.read( OpenSSLSocketImpl.java:662)org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:103)at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:191)

Below one happened 2.5 minutes ( socket timeout was set to 5 minutes), client was sending 144kb of data 下面发生了2.5分钟(套接字超时设置为5分钟),客户端发送了144kb的数据

javax.net.ssl.SSLException: Write error: ssl=0x5e4f4640: I/O error during system call, Broken pipe at org.apache.harmony.xnet.provider.jsse.NativeCrypto.SSL_write(Native Method) at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl$SSLOutputStream.write(OpenSSLSocketImpl.java:704) at org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:109) at org.apache.http.impl.io.ContentLengthOutputStream.write(ContentLengthOutputStream.java:113) javax.net.ssl.SSLException:写入错误:ssl = 0x5e4f4640:系统调用期间的I / O错误,org.apache中的org.apache.harmony.xnet.provider.jsse.NativeCrypto.SSL_write(本机方法)中的管道损坏。 harmony.xnet.provider.jsse.OpenSSLSocketImpl $ SSLOutputStream.write(OpenSSLSocketImpl.java:704)位于org.apache.http.impl的org.apache.http.impl.io.AbstractSessionOutputBuffer.write(AbstractSessionOutputBuffer.java:109)。 io.ContentLengthOutputStream.write(ContentLengthOutputStream.java:113)

Below one happened after 1 minute. 1分钟后发生了一次。

Stack trace javax.net.ssl.SSLException: Connection closed by peer at org.apache.harmony.xnet.provider.jsse.NativeCrypto.SSL_do_handshake(Native Method) at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:378) at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl$SSLInputStream.(OpenSSLSocketImpl.java:634) at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.getInputStream(OpenSSLSocketImpl.java:605) 堆栈跟踪javax.net.ssl.SSLException:org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.startHandshake上org.apache.harmony.xnet.provider.jsse.NativeCrypto.SSL_do_handshake(本地方法)的对等关闭连接(OpenSSLSocketImpl.java:378)org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl $ SSLInputStream。(OpenSSLSocketImpl.java:634)at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.getInputStream(OpenSSLSocketImpl。 Java的:605)

Below one happened after 77 seconds 77秒后发生了一次

Stack trace javax.net.ssl.SSLException: SSL handshake aborted: ssl=0x5e2baf00: I/O error during system call, Connection reset by peer at org.apache.harmony.xnet.provider.jsse.NativeCrypto.SSL_do_handshake(Native Method) at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:378) at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl$SSLInputStream.(OpenSSLSocketImpl.java:634) at org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.getInputStream(OpenSSLSocketImpl.java:605) at org.apache.http.impl.io.SocketInputBuffer.(SocketInputBuffer.java:70) 堆栈跟踪javax.net.ssl.SSLException:SSL握手中止:ssl = 0x5e2baf00:系统调用期间的I / O错误,org.apache.harmony.xnet.provider.jsse.NativeCrypto.SSL_do_handshake(本机方法)中的对等连接重置在org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:378)org.apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl $ SSLInputStream。(OpenSSLSocketImpl.java:634)at org。 org.apache.http.impl.io.SocketInputBuffer上的apache.harmony.xnet.provider.jsse.OpenSSLSocketImpl.getInputStream(OpenSSLSocketImpl.java:605)。(SocketInputBuffer.java:70)

Below one happened after 15 seconds (Connect timeout is set to 15 seconds) 15秒后发生一次以下(连接超时设置为15秒)

Time Taken : 15081 Stack trace org.apache.http.conn.ConnectTimeoutException: Connect to /103.xx.xx.xx:443 timed out at org.apache.http.conn.scheme.PlainSocketFactory.connectSocket(PlainSocketFactory.java:121) at org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:144) at org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:164) at org.apache.http.impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:119) at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:365) 拍摄时间:15081堆栈跟踪org.apache.http.conn.ConnectTimeoutException:连接到org.apache.http.conn.scheme.PlainSocketFactory.connectSocket上的/103.xx.xx.xx:443超时(PlainSocketFactory.java:121 )org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:144)位于org.apache.http的org.apache.http.impl.conn.AbstractPoolEntry.open(AbstractPoolEntry.java:164)。 impl.conn.AbstractPooledConnAdapter.open(AbstractPooledConnAdapter.java:119)at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:365)

Here is the source code snippets that I use for posting the reqeust 以下是我用于发布请求的源代码片段

HttpParams params = new BasicHttpParams();
HttpConnectionParams.setConnectionTimeout(params, 15000); //15 seconds
HttpConnectionParams.setSoTimeout(params, 300000); // 5 minutes

HttpClient client = getHttpClient(params);
HttpPost post = new HttpPost(uri);
post.setEntity(new ByteArrayEntity(requestByteArray));
HttpResponse httpResponse = client.execute(post);

    ....

public static HttpClient getHttpClient(HttpParams params) {
    try {
        KeyStore trustStore = KeyStore.getInstance(KeyStore.getDefaultType());
        trustStore.load(null, null);

        SSLSocketFactory sf = new TrustAllCertsSSLSocketFactory(trustStore);
        sf.setHostnameVerifier(SSLSocketFactory.STRICT_HOSTNAME_VERIFIER);


        HttpProtocolParams.setVersion(params, HttpVersion.HTTP_1_1);
        HttpProtocolParams.setContentCharset(params, HTTP.UTF_8);

        SchemeRegistry registry = new SchemeRegistry();
        registry.register(new Scheme("http", PlainSocketFactory.getSocketFactory(), 80));
        registry.register(new Scheme("https", sf, 443));

        ClientConnectionManager ccm = new ThreadSafeClientConnManager(params, registry);
        DefaultHttpClient client = new DefaultHttpClient(ccm, params);
        // below line of code will disable the retrying of HTTP request when connection is timed
        // out.

        client.setHttpRequestRetryHandler(new DefaultHttpRequestRetryHandler(0, false));
        return client;
    } catch (Exception e) {
        return new DefaultHttpClient();
    }
}

I have read some forums indicating that we should use HttpUrlConnection class. 我已经阅读了一些论坛,表明我们应该使用HttpUrlConnection类。 I did make code changes to use https://code.google.com/p/basic-http-client/ as a hot fix. 我确实更改了代码以使用https://code.google.com/p/basic-http-client/作为热修复。 Though it worked on my Samsung phone, it seemed to have some issue in phone customer was using, it was not even able to connect to our site. 虽然它可以在我的三星手机上运行,​​但它似乎在手机客户使用中存在一些问题,甚至无法连接到我们的网站。 I had to roll it back, though I can relook at it if the root cause can be pinned to DefaultHttpClient. 我不得不将其回滚,但如果根本原因可以固定到DefaultHttpClient,我可以重新查看它。

OUr web server is nginx, and our web service runs on Apache Tomcat. OUr Web服务器是nginx,我们的Web服务在Apache Tomcat上运行。 Customers are mostly using Android 4.1+ phones. 客户大多使用Android 4.1+手机。 The customer from whose phone I have retrieved above stack traces is using Micromax A110Q phone with Android 4.2.1 从我的手机上面检索到堆栈跟踪的客户正在使用带有Android 4.2.1的Micromax A110Q手机

Any inputs on this will be highly appreciated. 对此的任何意见都将受到高度赞赏。 Thanks a lot! 非常感谢!

Update: 更新:

  1. I had noticed that we were not shutting down the Connection Manager. 我注意到我们没有关闭Connection Manager。 So added below code in finally block of the code where I use the http client. 所以在我使用http客户端的代码的finally块中添加了下面的代码。
  if (client != null) { client.getConnectionManager().shutdown(); } 
  1. Updated nginx configuration to accept data upto size of 5M as its default is 1Mb and some clients were submitting more than 1MB and server was severing connection with 413 error. 更新了nginx配置以接受最大为5M的数据,因为它的默认值为1Mb,一些客户端提交的数据超过1MB,服务器正在切断与413错误的连接。
 client_max_body_size 5M; 
  1. Also increased the nginx proxy read timeout so that it waits longer for getting data from client. 还增加了nginx代理读取超时,以便等待从客户端获取数据的时间更长。
 proxy_read_timeout 300; 

With the above changes, the errors have reduced a bit. 通过上述更改,错误有所减少。 In last one week, I see following two types of erros: 在过去的一周里,我看到了以下两种类型的错误:

  1. org.apache.http.conn.ConnectTimeoutException: Connect to /103.xx.xx.xxx:443 timed out - This happens in 15 seconds which is my connect timeout. org.apache.http.conn.ConnectTimeoutException: Connect to /103.xx.xx.xxx:443 timed out - 这发生在15秒内,这是我的连接超时。 I am assuming that this happens as client is unable to reach to server due to network slowness or as @JaySoyer pointed out, may be due to network switching. 我假设这是因为客户端由于网络速度缓慢而无法访问服务器或@JaySoyer指出,可能是由于网络切换。

  2. java.net.SocketTimeoutException: SSL handshake timed out at org.apache.harmony.xnet.provider.jsse.NativeCrypto.SSL_do_handshake(Native Method) . java.net.SocketTimeoutException: SSL handshake timed out at org.apache.harmony.xnet.provider.jsse.NativeCrypto.SSL_do_handshake(Native Method) This is happening at the expiry of socket timeout. 这是在套接字超时到期时发生的。 I am now using 1 minute as socket timeout for small requests, and 3 and 6 minutes for packets upto 75 KB and higher respectively. 我现在使用1分钟作为小型请求的套接字超时,对于高达75 KB及更高的数据包分别使用3分钟和6分钟。

However, these errors have reduced considerably, and I am seeing 1 failure in 100 requests, compared with earlier version of my code where it was 1 in 10 requests. 但是,这些错误已大大减少,而且我发现100个请求中有1个失败,而我的代码的早期版本则是10个请求中的1个。

I recently had to do an exhaustive analysis of my company's app as we were seeing a bunch of similar errors and didn't know why. 我最近不得不对我公司的应用程序进行详尽的分析,因为我们看到了一堆类似的错误而且不知道为什么。 We ended up handing out custom apps that literally logged their connection times, errors, signal quality, etc to a file. 我们最终发布了自定义应用程序,它们将连接时间,错误,信号质量等记录到文件中。 Did that for weeks. 几周之后就这样做了。 Collect thousands of data points. 收集数以千计的数据点。 Keep in mind, we maintain a persistent connection while the app is open. 请记住,我们在应用程序打开时保持持久连接。

Turns out most of our errors were from switching networks. 事实证明,我们的大多数错误来自交换网络。 This is actually really common for an average user. 这对普通用户来说实际上很常见。 So lets say a user is using an EDGE cell network, then walks within WIFI range or vice versa. 因此,假设用户正在使用EDGE小区网络,然后在WIFI范围内行走,反之亦然。 When this occurs, Android literally severs the cell connection and makes an entirely new connection to the WIFI. 发生这种情况时,Android会逐字地切断单元连接,并与WIFI建立全新的连接。 From the apps perspective, it's similar to turning on airplane mode then flicking it back off again. 从应用程序的角度来看,它类似于打开飞行模式然后再次将其重新打开。 This even occurs when switching within a cell networks. 这甚至在小区网络内切换时发生。 Eg, LTE to HSPA+. 例如,LTE到HSPA +。 Each time this happens, Android will fire off the network connective changed broadcast. 每次发生这种情况,Android都会关闭网络连接改变广播。

Of those you listed, this behavior was causing the following similar errors: 在您列出的那些中,此行为导致以下类似错误:

  • javax.net.ssl.SSLException: Write error: ssl=0x5e4f4640 javax.net.ssl.SSLException:写入错误:ssl = 0x5e4f4640
  • javax.net.ssl.SSLException: SSL handshake aborted: javax.net.ssl.SSLException:SSL握手中止:

Sometimes the network switch was fast, sometimes slow. 有时网络交换机很快,有时很慢。 Turns out, we were not cleaning up our resources in time with the fast switches. 事实证明,我们没有使用快速开关及时清理我们的资源。 As a result we were attempting to re-connect to our servers with stale/old TCP connections that threw even more odd errors. 因此,我们尝试使用陈旧/旧的TCP连接重新连接到我们的服务器,这些连接引发了更多奇怪的错误。

So I guess the take away is, if you are maintaining a connection for a long period of time, expect to see the phone constantly switch between networks, especially when the signal is weak. 所以我猜想,如果你长时间保持连接,那么预计会看到手机不断在网络之间切换,尤其是在信号较弱时。 When that network switch occurs, you'll see SSLExeptions and it's completely normal. 当发生网络切换时,您将看到SSLExeptions,这是完全正常的。 Just gotta make sure you clean up your resources and reconnect properly. 只需要确保清理资源并正确重新连接。

Since you are dealing with what looks like poor network connectivity, consider a more fault-tolerant HTTP client. 由于您正在处理看起来不良的网络连接,请考虑更容错的HTTP客户端。 The one I like is OkHTTP . 我喜欢的是OkHTTP From their description: 从他们的描述:

OkHttp perseveres when the network is troublesome: it will silently recover from common connection problems. 当网络很麻烦时,OkHttp坚持不懈:它将从常见的连接问题中无声地恢复。 If your service has multiple IP addresses OkHttp will attempt alternate addresses if the first connect fails. 如果您的服务有多个IP地址,如果第一次连接失败,OkHttp将尝试备用地址。 This is necessary for IPv4+IPv6 and for services hosted in redundant data centers. 这对于IPv4 + IPv6和冗余数据中心中托管的服务是必需的。 OkHttp initiates new connections with modern TLS features (SNI, ALPN), and falls back to SSLv3 if the handshake fails. OkHttp使用现代TLS功能(SNI,ALPN)启动新连接,并在握手失败时回退到SSLv3。

The implementation would be mostly a drop-in replacement. 实施将主要是替代品。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM