简体   繁体   中英

DSE Hadoop intermittent timeout errors during read

I am having a weird error that has started happening a few weeks ago. We had to replace several analytics nodes and none of the hadoop jobs invoked by hive are able to finish. They crash on different stages with the similar error:

com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: ip-x-x-x-x.ec2.internal/x.x.x.x:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
    at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:65)
    at com.datastax.driver.core.DefaultResultSetFuture.extractCauseFromExecutionException(DefaultResultSetFuture.java:256)
    at com.datastax.driver.core.ArrayBackedResultSet$MultiPage.prepareNextRow(ArrayBackedResultSet.java:259)
    at com.datastax.driver.core.ArrayBackedResultSet$MultiPage.isExhausted(ArrayBackedResultSet.java:222)
    at com.datastax.driver.core.ArrayBackedResultSet$1.hasNext(ArrayBackedResultSet.java:115)
    at org.apache.cassandra.hadoop.cql3.CqlRecordReader$RowIterator.computeNext(CqlRecordReader.java:239)
    at org.apache.cassandra.hadoop.cql3.CqlRecordReader$RowIterator.computeNext(CqlRecordReader.java:218)
    at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
    at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
    at org.apache.cassandra.hadoop.cql3.CqlRecordReader.getProgress(CqlRecordReader.java:152)
    at org.apache.hadoop.hive.cassandra.cql3.input.CqlHiveRecordReader.getProgress(CqlHiveRecordReader.java:62)
    at org.apache.hadoop.hive.ql.io.HiveRecordReader.getProgress(HiveRecordReader.java:71)
    at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.getProgress(MapTask.java:260)
    at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:233)
    at org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:216)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:48)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:436)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:266)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1121)
    at org.apache.hadoop.mapred.Child.main(Child.java:260)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: ip-x-x-x-x.ec2.internal/x.x.x.x:9042 (com.datastax.driver.core.exceptions.DriverException: Timeout during read))
    at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:103)
    at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:175)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

I turned debug logging, but still could not find anything that was happening around that time.

Thanks!

Actually, the problem was in the huge amount of data written by the application into one of the map columns. It just happen co coincide with the cluster update. The hive jobs were just hanging and there was a misleading error message. By some trial and error I was able to narrow down the issue to that map column and delete the offending data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM