簡體   English   中英

Hadoop超時嘗試在AWS多區域配置中寫入Cassandra

[英]Hadoop timing out trying to write to Cassandra in AWS multi-region configuration

我在AWS中運行一個多DC Cassandra(開源,而不是DSE)集群,其中一個DC(us-west-2)設置為進行分析,另一個DC(us-east)為事務存儲。 我將NetworkTopologyStrategy與EC2 snitch一起使用,並且在Hadoop配置中使用LOCAL_ONE的一致性級別。 Hadoop 可以毫無問題地從Cassandra讀取 ,但是嘗試寫入會產生超時異常

運行的nodetool status顯示DC已正確配置:

Datacenter: us-west-2
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Owns   Host ID                               Token                                    Rack
UN  x.x.x.x       1.01 GB     9.9%   9e7f4393-7ac9-4559-b3ff-de48be50016f  -9127921345534057723                     2a
UN  x.x.x.x       1001.16 MB  11.4%  d0760383-c3dd-474c-9261-239b71dba3f1  -9221279003374097975                     2b
UN  x.x.x.x       1.05 GB     11.7%  3f09fbf5-0d85-4283-9009-0ec0e29223c0  -9140104347498952504                     2c
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Owns   Host ID                               Token                                    Rack
UN  x.x.x.x       1.1 GB     11.3%  5bbd2de4-e1d2-4a17-9f40-034f60b35954  -9061054426204373981                     1b
UN  x.x.x.x       1.15 GB    11.5%  e34c590e-6176-45b2-a8f9-18b4a9a80032  -9216519687724118609                     1c
UN  x.x.x.x       1.18 GB    10.9%  fa0b0a1a-f156-40fc-a267-970d1eb9cddb  -9207673937991303291                     1a
UN  x.x.x.x       1.46 GB    10.7%  b18ae406-c9ec-42b7-a365-b0c6e2fe582f  -9206671929961171506                     1a
UN  x.x.x.x       1.13 GB    11.4%  1ac9c1c5-55ad-4048-b1ba-3b9768933ecc  -9146100851344467112                     1c
UN  x.x.x.x       1.53 GB    11.2%  dad665bb-68d9-4811-b421-f33333261867  -9178920986366339267                     1b

使用ColumnFamilyOutputFormat進行堆棧跟蹤:

java.io.IOException: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection timed out
    at org.apache.cassandra.hadoop.ColumnFamilyRecordWriter$RangeClient.run(ColumnFamilyRecordWriter.java:224)
Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection timed out
    at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
    at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
    at org.apache.cassandra.thrift.TFramedTransportFactory.openTransport(TFramedTransportFactory.java:41)
    at org.apache.cassandra.hadoop.AbstractColumnFamilyOutputFormat.createAuthenticatedClient(AbstractColumnFamilyOutputFormat.java:123)
    at org.apache.cassandra.hadoop.ColumnFamilyRecordWriter$RangeClient.run(ColumnFamilyRecordWriter.java:215)
Caused by: java.net.ConnectException: Connection timed out
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:579)
    at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
    ... 4 more

...並使用CqlOutputFormat:

java.io.IOException: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection timed out
    at org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.run(CqlRecordWriter.java:271)
Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection timed out
    at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
    at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
    at org.apache.cassandra.thrift.TFramedTransportFactory.openTransport(TFramedTransportFactory.java:41)
    at org.apache.cassandra.hadoop.AbstractColumnFamilyOutputFormat.createAuthenticatedClient(AbstractColumnFamilyOutputFormat.java:123)
    at org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.run(CqlRecordWriter.java:262)
Caused by: java.net.ConnectException: Connection timed out
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:579)
    at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
    ... 4 more

兩條跟蹤最終都指向AbstractColumnFamilyOutputFormat.createAuthenticatedClient(host, port, conf)

然后,我打開該源,並為異常添加了一些詳細信息,因此它將輸出所連接的主機名,從而產生了以下跟蹤信息:

java.io.IOException: java.lang.Exception: Unable to connect to host [hostname]
    at org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.run(CqlRecordWriter.java:271)
Caused by: java.lang.Exception: Unable to connect to host [hostname]
    at org.apache.cassandra.hadoop.AbstractColumnFamilyOutputFormat.createAuthenticatedClient(AbstractColumnFamilyOutputFormat.java:139)
    at org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.run(CqlRecordWriter.java:262)
Caused by: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection timed out
    at org.apache.thrift.transport.TSocket.open(TSocket.java:185)
    at org.apache.thrift.transport.TFramedTransport.open(TFramedTransport.java:81)
    at org.apache.cassandra.thrift.TFramedTransportFactory.openTransport(TFramedTransportFactory.java:41)
    at org.apache.cassandra.hadoop.AbstractColumnFamilyOutputFormat.createAuthenticatedClient(AbstractColumnFamilyOutputFormat.java:124)
    ... 1 more
Caused by: java.net.ConnectException: Connection timed out
    at java.net.PlainSocketImpl.socketConnect(Native Method)
    at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
    at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
    at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
    at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
    at java.net.Socket.connect(Socket.java:579)
    at org.apache.thrift.transport.TSocket.open(TSocket.java:180)
    ... 4 more

問題是[主機名]是不在分析集群中的計算機(它在美國東部) 為什么它不自動地知道這一點,特別是當讀取正常工作時? 似乎正在嘗試環網中的所有節點,而與DC無關。

作為記錄,使用CqlOutputFormatColumnFamilyOutputFormat以及通過使用CqlStorageCassandraStorage Pig寫入失敗。

我會說,嘗試將cassandra.yaml中的write_request_timeout_in_ms設置為一個很高的數字,看看是否有幫助。 節點本身在出現故障時仍無響應時,可能會出現問題。 如果仍然超時,請在引起問題的那個節點上重新啟動服務。

這個問題歸結為兩件事:

  1. 對於多區域EC2設置,Cassandra要求將broadcast_address設置為公用IP,將listen_address設置為內部IP。 在大多數情況下,您希望rpc_address為內部IP,但這可能會破壞Cassandra的Hadoop客戶端,后者基於廣播_地址確定要與之對話的端點。

  2. Cassandra的Hadoop客戶端(特別是RingCache)在節點發現時不考慮數據中心,而是嘗試發現環中的所有節點-包括非本地節點。 它尊重實際寫入的一致性級別,但是在我們的例子中,由於#1,它從未達到目標。

我提交了票證並提交了補丁程序來解決這些問題:

https://issues.apache.org/jira/browse/CASSANDRA-7252

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM