简体   繁体   English

将Ceph端点设置为DNS在Hadoop中不起作用

[英]Set Ceph endpoint to DNS doesn't work in Hadoop

I'm trying to enabling big data environment which contains Hadoop (2.7), Spark(2.3) and Ceph(luminous). 我正在尝试启用包含Hadoop(2.7),Spark(2.3)和Ceph(luminous)的大数据环境。 Before changing fs.s3a.endpoint to Domain Name, everything worked fine just as expected. 在将fs.s3a.endpoint更改为Domain Name之前,一切正常。

The key part of core-site.xml is like below: core-site.xml的关键部分如下所示:

<property>
    <name>fs.defaultFS</name>
    <value>s3a://tpcds</value>
</property>
<property>
        <name>fs.s3a.endpoint</name>
        <value>http://10.1.2.213:8080</value>
</property>

However, when I changed the fs.s3a.endpoint to Domain Name like below: 但是,当我将fs.s3a.endpoint更改为Domain Name时,如下所示:

<property>
        <name>fs.s3a.endpoint</name>
        <value>http://gw.gearon.com:8080</value>
</property>

And I tried to launch SparkSQL on the Hadoop Yarn, the error like below throws: 我试图在Hadoop Yarn上启动SparkSQL,出现如下错误:

AmazonHttpClient:448 - Unable to execute HTTP request: tpcds.gw.gearon.com: Name or service not known
java.net.UnknownHostException: tpcds.gw.gearon.com: Name or service not known
        at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
        at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:929)
        at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1324)
        at java.net.InetAddress.getAllByName0(InetAddress.java:1277)

The gw.gearon.com is forwarded to 10.1.2.213 for sure. 确保将gw.gearon.com转发到10.1.2.213 After googling, I realized one more attribute should be set. 谷歌搜索后,我意识到应该再设置一个属性。

<property>
  <name>fs.s3a.path.style.access</name>
  <value>true</value>
  <description>Enable S3 path style access ie disabling the default virtual hosting behaviour.
    Useful for S3A-compliant storage providers as it removes the need to set up DNS for virtual hosting.
  </description>
</property>

After setting fs.s3a.path.style.access to true , the error disappears when launching Hadoop Map-Reduce. fs.s3a.path.style.access设置为true ,启动Hadoop Map-Reduce时错误消失。 However, for Spark-SQL on Hadoop Yarn, the error still exists. 但是,对于Hadoop Yarn上的Spark-SQL ,该错误仍然存​​在。 I thought maybe Spark overrides Hadoop's settings, so I also append spark.hadoop.fs.s3a.path.style.access true to spark-defaults.xml , it still doesn't work. 我以为Spark可能会覆盖Hadoop的设置,所以我也将spark.hadoop.fs.s3a.path.style.access true附加到spark-defaults.xml ,它仍然无法正常工作。

So here come to the question: The endpoint I set is http://gw.gearon.com:8080 , why the error showed me tpcds.gw.gearon.com is unknown? 因此,这里出现一个问题:我设置的endpointhttp://gw.gearon.com:8080 ,为什么错误告诉我tpcds.gw.gearon.com是未知的? The tpcds is my Ceph bucket name I set it as my fs.defaultFS , it looks fine in core-site.xml . tpcds是我的Ceph存储桶名称,我将其设置为fs.defaultFS ,在core-site.xml看起来还不错。 How can I solve the issue? 我该如何解决这个问题?

Any comment is welcomed and thanks for your help in advance. 欢迎任何评论,并感谢您的提前帮助。

You should use "amazon naming methods", as described here and here . 您应该使用“ amazon命名方法”,如此此处所述

That is, point a wildcard dns CNAME to the name of the gateway(s): 也就是说,将通配符dns CNAME指向网关的名称:

*.gw.gearon.com CNAME 10.1.2.213

Also be sure to properly setup that name into the gateways (documentation here ): 另外,请确保在网关中正确设置该名称( 此处的文档):

rgw dns name = clover.voxelgroup.net

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM