简体   繁体   中英

Error writing to S3 using TextIO.write() (Google Dataflow)

I have a Google Dataflow code similar to the below

import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.io.FileSystems;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.io.fs.ResourceId;
import org.apache.beam.sdk.options.*;
import org.apache.beam.sdk.transforms.Create;

public class S3Test {

    public static void main(String[] args) {
        PipelineOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().create();
        Pipeline pipeline = Pipeline.create(options);

        ResourceId outputDir = FileSystems.matchNewResource("s3://my-bucket/temp/", true);

        pipeline
                .apply("New",
                        Create.of("Hello World!!")
                )
                .apply(
                        "Write to S3",
                        TextIO.write()
                                .to("s3://my-bucket/test.txt")
                                .withoutSharding()
                                .withTempDirectory(outputDir)
                );

        pipeline.run();
    }
}

This works to some extent, in that it first creates a temporary file on my S3 bucket. However, when it tries to rename it to my final desired/input filename, it throws the following error.

Receiver class org.apache.beam.sdk.io.aws.s3.S3FileSystem does not define or inherit an implementation of the resolved method 'abstract void rename(java.util.List, java.util.List, org.apache.beam.sdk.io.fs.MoveOptions[])' of abstract class org.apache.beam.sdk.io.FileSystem.

I guess I have to override or implement a rename() function, but not sure where I'd start on that.

Has anyone tried to write successfully to S3 using TextIO.write()?

EDIT:

Below is the full message in the Run window:

Dec 24, 2022 4:56:34 PM org.apache.beam.sdk.io.WriteFiles$WriteShardsIntoTempFilesFn processElement INFO: Opening writer 8a0dbf70-a0fc-48dd-92f9-8ff4000a8076 for window org.apache.beam.sdk.transforms.windowing.GlobalWindow@2dddc1b9 pane PaneInfo{isFirst=true, isLast=true, timing=ON_TIME, index=0, onTimeIndex=0} destination null Dec 24, 2022 4:56:37 PM org.apache.beam.sdk.io.FileBasedSink$Writer close INFO: Successfully wrote temporary file s3://my-bucket/temp/.temp-beam-13da7e3e-15a9-4285-a90c-30c7c6dcfe99/9e53ec6a8a0dbf70-a0fc-48dd-92f9-8ff4000a8076 Dec 24, 2022 4:56:37 PM org.apache.beam.sdk.io.WriteFiles$FinalizeTempFileBundles$FinalizeFn process INFO: Finalizing 1 file results Dec 24, 2022 4:56:37 PM org.apache.beam.sdk.io.FileBasedSink$WriteOperation createMissingEmptyShards INFO: Finalizing for destination null num shards 1. Dec 24, 2022 4:56:37 PM org.apache.beam.sdk.io.FileBasedSink$WriteOperation moveToOutputFiles INFO: Will copy temporary file FileResult{tempFilename=s3://my- bucket/temp/.temp-beam-13da7e3e-15a9-4285-a90c-30c7c6dcfe99/9e53ec6a8a0dbf70-a0fc-48dd-92f9-8ff4000a8076, shard=0, window=org.apache.beam.sdk.transforms.windowing.GlobalWindow@2dddc1b9, paneInfo=PaneInfo{isFirst=true, isLast=true, timing=ON_TIME, index=0, onTimeIndex=0}} to final location s3://my-bucket/test.txt Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.AbstractMethodError: Receiver class org.apache.beam.sdk.io.aws.s3.S3FileSystem does not define or inherit an implementation of the resolved method 'abstract void rename(java.util.List, java.util.List, org.apache.beam.sdk.io.fs.MoveOptions[])' of abstract class org.apache.beam.sdk.io.FileSystem. at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:374) at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:342) at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:218) at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:67) at org.apache.beam.sdk.Pipeline.run(Pipeline.java:323) at org.apache.beam.sdk.Pipeline.run(Pipeline.java:309) at com.example.S3Test.main(S3Test.java:255) Caused by: java.lang.AbstractMethodError: Receiver class org.apache.beam.sdk.io.aws.s3.S3FileSystem does not define or inherit an implementation of the resolved method 'abstract void rename(java.util.List, java.util.List, org.apache.beam.sdk.io.fs.MoveOptions[])' of abstract class org.apache.beam.sdk.io.FileSystem. at org.apache.beam.sdk.io.FileSystems.renameInternal(FileSystems.java:323) at org.apache.beam.sdk.io.FileSystems.rename(FileSystems.java:308) at org.apache.beam.sdk.io.FileBasedSink$WriteOperation.moveToOutputFiles(FileBasedSink.java:802) at org.apache.beam.sdk.io.WriteFiles$FinalizeTempFileBundles$FinalizeFn.process(WriteFiles.java:1077)

Try updating your pipeline... by default withNumShards() is set to zero, so it will write to a single file

pipeline
        .apply("New",
                Create.of("Hello World!!")
        )
        .apply(
                "Write to S3",
                TextIO.write()
                        .to("s3://my-bucket/test.txt")
                        .withNumShards(1)
                        .withTempDirectory(outputDir)
        );

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM