[英]Standalone hive metastore with Iceberg and S3
I'd like to use Presto to query Iceberg tables stored in S3 as parquet files, therefore I need to use Hive metastore.我想使用 Presto 查询作为镶木地板文件存储在 S3 中的 Iceberg 表,因此我需要使用 Hive Metastore。 I'm running a standalone hive metastore service backed by MySql.我正在运行由 MySql 支持的独立配置单元元存储服务。 I've configured Iceberg to use Hive catalog:我已将 Iceberg 配置为使用 Hive 目录:
import org.apache.hadoop.conf.Configuration;
import org.apache.iceberg.catalog.Namespace;
import org.apache.iceberg.hive.HiveCatalog;
public class MetastoreTest {
public static void main(String[] args) {
Configuration conf = new Configuration();
conf.set("hive.metastore.uris", "thrift://x.x.x.x:9083");
conf.set("hive.metastore.warehouse.dir", "s3://bucket/warehouse");
HiveCatalog catalog = new HiveCatalog(conf);
catalog.createNamespace(Namespace.of("my_metastore"));
}
}
I'm getting the following error: Caused by: MetaException(message:Got exception: org.apache.hadoop.fs.UnsupportedFileSystemException No FileSystem for scheme "s3")
我收到以下错误: Caused by: MetaException(message:Got exception: org.apache.hadoop.fs.UnsupportedFileSystemException No FileSystem for scheme "s3")
I've included /hadoop-3.3.0/share/hadoop/tools/lib
in HADOOP_CLASSPATH
, also copied aws related jars to apache-hive-metastore-3.0.0-bin/lib
.我已经在HADOOP_CLASSPATH
包含了/hadoop-3.3.0/share/hadoop/tools/lib
,还将与 aws 相关的 jars 复制到apache-hive-metastore-3.0.0-bin/lib
。 What else is missing?还缺少什么?
Finally figured this out.终于想通了这一点。 First (as I already mentioned before) I had to include hadoop/share/hadoop/tools/lib
in HADOOP_CLASSPATH
.首先(正如我之前已经提到的)我必须在HADOOP_CLASSPATH
包含hadoop/share/hadoop/tools/lib
。 However neither modifying HADOOP_CLASSPATH
nor copying particular files from tools to common worked for me.但是,既没有修改HADOOP_CLASSPATH
也没有将特定文件从工具复制到通用文件对我HADOOP_CLASSPATH
。 Then I switched to hadoop-2.7.7 and it worked.然后我切换到 hadoop-2.7.7 并且它起作用了。 Also, I had to copy jackson related jars from tools to common.另外,我不得不将与 jackson 相关的 jars 从工具复制到 common。 My hadoop/etc/hadoop/core-site.xml
looks like this:我的hadoop/etc/hadoop/core-site.xml
看起来像这样:
<configuration>
<property>
<name>fs.default.name</name>
<value>s3a://{bucket_name}</value>
</property>
<property>
<name>fs.s3a.impl</name>
<value>org.apache.hadoop.fs.s3a.S3AFileSystem</value>
</property>
<property>
<name>fs.s3a.endpoint</name>
<value>{s3_endpoint}</value>
<description>AWS S3 endpoint to connect to. An up-to-date list is
provided in the AWS Documentation: regions and endpoints. Without this
property, the standard region (s3.amazonaws.com) is assumed.
</description>
</property>
<property>
<name>fs.s3a.access.key</name>
<value>{access_key}</value>
</property>
<property>
<name>fs.s3a.secret.key</name>
<value>{secret_key}</value>
</property>
</configuration>
at this point, you should be able to ls your s3 bucket:此时,您应该能够 ls 您的 s3 存储桶:
hadoop fs -ls s3a://{bucket}/
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.