简体   繁体   English

访问 S3 访问点时,带有 AWS S3 插件的 Flink FileSync 抛出错误 - “null uri host”

[英]Flink FileSync with AWS S3 plugin throw error when accessing S3 access point - "null uri host"

After following this instruction I am able to access the S3 bucket via access point + VPC endpoint perfectly fine from AWS CLI.遵循此说明后,我可以通过访问点 + VPC 端点从 AWS CLI 访问 S3 存储桶。

Basically I use基本上我用

s3://arn:aws:s3:ap-southeast-1:<account number>:accesspoint/<bucket name>

the same way as I use和我用的一样

s3://<bucket name>

All aws s3... commands works great.所有aws s3...命令都很好用。

However that's not the case for my my Java based Flink project code.但是,我的基于 Java 的 Flink 项目代码并非如此。 The code works great with s3://<bucket name> , but it seems that it does not recognize the new S3 URI.该代码适用于s3://<bucket name> ,但它似乎无法识别新的 S3 URI。

Here is how the sink is defined in my code:这是我的代码中定义接收器的方式:

final FileSink<ConsumerRecordPOJO<CacheInfo>> sink = FileSink //
                .<ConsumerRecordPOJO<CacheInfo>>forRowFormat(new Path(s3Url),
                        new Encoder<ConsumerRecordPOJO<CacheInfo>>() {

                            @Override
                            public void encode(ConsumerRecordPOJO<CacheInfo> record, OutputStream stream)
                                    throws IOException {
                                GzipParameters params = new GzipParameters();
                                params.setCompressionLevel(Deflater.BEST_COMPRESSION);

                                GzipCompressorOutputStream out = new GzipCompressorOutputStream(stream, params);

                                OBJECT_MAPPER.writeValue(out, record);

                                out.finish();
                            }

                        }) //
                // (some extra configuration omitted here)
                .build();

After passing s3://arn:aws:s3:ap-southeast-1:<account number>:accesspoint/<bucket name> to the s3Url param, the job execution failed withs3://arn:aws:s3:ap-southeast-1:<account number>:accesspoint/<bucket name>传递给s3Url参数后,作业执行失败

2021-11-26 22:14:34,085 WARN  org.apache.flink.runtime.taskmanager.Task                    [] - Source: kafka -> Filter -> Map -> Sink file (1/1)#3 (c654160d3fab026c4544ca8a64644796) switched from INITIALIZING to FAILED with failure cause: org.apache.flink.util.FlinkRuntimeException: Could not create writer state serializer.
        at org.apache.flink.connector.file.sink.FileSink.getWriterStateSerializer(FileSink.java:135)
        at org.apache.flink.streaming.runtime.operators.sink.SinkOperatorFactory.createStreamOperator(SinkOperatorFactory.java:63)
        at org.apache.flink.streaming.api.operators.StreamOperatorFactoryUtil.createOperator(StreamOperatorFactoryUtil.java:81)
        at org.apache.flink.streaming.runtime.tasks.OperatorChain.createOperator(OperatorChain.java:712)
        at org.apache.flink.streaming.runtime.tasks.OperatorChain.createOperatorChain(OperatorChain.java:686)
        at org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:626)
        at org.apache.flink.streaming.runtime.tasks.OperatorChain.createOperatorChain(OperatorChain.java:676)
        at org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:626)
        at org.apache.flink.streaming.runtime.tasks.OperatorChain.createOperatorChain(OperatorChain.java:676)
        at org.apache.flink.streaming.runtime.tasks.OperatorChain.createOutputCollector(OperatorChain.java:626)
        at org.apache.flink.streaming.runtime.tasks.OperatorChain.<init>(OperatorChain.java:187)
        at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.<init>(RegularOperatorChain.java:63)
        at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:666)
        at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654)
        at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958)
        at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927)
        at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:766)
        at org.apache.flink.runtime.taskmanager.Task.run(Task.java:575)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: null uri host.
        at org.apache.flink.fs.s3.common.AbstractS3FileSystemFactory.create(AbstractS3FileSystemFactory.java:162)
        at org.apache.flink.core.fs.PluginFileSystemFactory.create(PluginFileSystemFactory.java:62)
        at org.apache.flink.core.fs.FileSystem.getUnguardedFileSystem(FileSystem.java:508)
        at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:409)
        at org.apache.flink.connector.file.sink.FileSink$RowFormatBuilder.createBucketWriter(FileSink.java:326)
        at org.apache.flink.connector.file.sink.FileSink$RowFormatBuilder.getWriterStateSerializer(FileSink.java:307)
        at org.apache.flink.connector.file.sink.FileSink.getWriterStateSerializer(FileSink.java:130)
        ... 18 more
Caused by: java.lang.NullPointerException: null uri host.
        at java.util.Objects.requireNonNull(Objects.java:228)
        at org.apache.hadoop.fs.s3native.S3xLoginHelper.buildFSURI(S3xLoginHelper.java:71)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.setUri(S3AFileSystem.java:486)
        at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:246)
        at org.apache.flink.fs.s3.common.AbstractS3FileSystemFactory.create(AbstractS3FileSystemFactory.java:123)
        ... 24 more

It turns out I could use the S3 access point alias which works perfectly for Flink事实证明我可以使用非常适合 Flink 的 S3 接入点别名

See https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-points-alias.html请参阅https://docs.aws.amazon.com/AmazonS3/latest/userguide/access-points-alias.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM