[英]Set Ceph endpoint to DNS doesn't work in Hadoop
I'm trying to enabling big data environment which contains Hadoop (2.7), Spark(2.3) and Ceph(luminous). 我正在尝试启用包含Hadoop(2.7),Spark(2.3)和Ceph(luminous)的大数据环境。 Before changing
fs.s3a.endpoint
to Domain Name, everything worked fine just as expected. 在将
fs.s3a.endpoint
更改为Domain Name之前,一切正常。
The key part of core-site.xml
is like below: core-site.xml
的关键部分如下所示:
<property>
<name>fs.defaultFS</name>
<value>s3a://tpcds</value>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>http://10.1.2.213:8080</value>
</property>
However, when I changed the fs.s3a.endpoint
to Domain Name like below: 但是,当我将
fs.s3a.endpoint
更改为Domain Name时,如下所示:
<property>
<name>fs.s3a.endpoint</name>
<value>http://gw.gearon.com:8080</value>
</property>
And I tried to launch SparkSQL on the Hadoop Yarn, the error like below throws: 我试图在Hadoop Yarn上启动SparkSQL,出现如下错误:
AmazonHttpClient:448 - Unable to execute HTTP request: tpcds.gw.gearon.com: Name or service not known
java.net.UnknownHostException: tpcds.gw.gearon.com: Name or service not known
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
at java.net.InetAddress.getAllByName0(InetAddress.java:1277)
The gw.gearon.com
is forwarded to 10.1.2.213
for sure. 确保将
gw.gearon.com
转发到10.1.2.213
。 After googling, I realized one more attribute should be set. 谷歌搜索后,我意识到应该再设置一个属性。
<property>
<name>fs.s3a.path.style.access</name>
<value>true</value>
<description>Enable S3 path style access ie disabling the default virtual hosting behaviour.
Useful for S3A-compliant storage providers as it removes the need to set up DNS for virtual hosting.
</description>
</property>
After setting fs.s3a.path.style.access
to true
, the error disappears when launching Hadoop Map-Reduce. 将
fs.s3a.path.style.access
设置为true
,启动Hadoop Map-Reduce时错误消失。 However, for Spark-SQL
on Hadoop Yarn, the error still exists. 但是,对于Hadoop Yarn上的
Spark-SQL
,该错误仍然存在。 I thought maybe Spark overrides Hadoop's settings, so I also append spark.hadoop.fs.s3a.path.style.access true
to spark-defaults.xml
, it still doesn't work. 我以为Spark可能会覆盖Hadoop的设置,所以我也将
spark.hadoop.fs.s3a.path.style.access true
附加到spark-defaults.xml
,它仍然无法正常工作。
So here come to the question: The endpoint
I set is http://gw.gearon.com:8080
, why the error showed me tpcds.gw.gearon.com
is unknown? 因此,这里出现一个问题:我设置的
endpoint
是http://gw.gearon.com:8080
,为什么错误告诉我tpcds.gw.gearon.com
是未知的? The tpcds
is my Ceph bucket name I set it as my fs.defaultFS
, it looks fine in core-site.xml
. tpcds
是我的Ceph存储桶名称,我将其设置为fs.defaultFS
,在core-site.xml
看起来还不错。 How can I solve the issue? 我该如何解决这个问题?
Any comment is welcomed and thanks for your help in advance. 欢迎任何评论,并感谢您的提前帮助。
You should use "amazon naming methods", as described here and here . 您应该使用“ amazon命名方法”,如此处和此处所述 。
That is, point a wildcard dns CNAME to the name of the gateway(s): 也就是说,将通配符dns CNAME指向网关的名称:
*.gw.gearon.com CNAME 10.1.2.213
Also be sure to properly setup that name into the gateways (documentation here ): 另外,请确保在网关中正确设置该名称( 此处的文档):
rgw dns name = clover.voxelgroup.net
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.