I have a Google Dataflow code similar to the below
import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.io.FileSystems;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.io.fs.ResourceId;
import org.apache.beam.sdk.options.*;
import org.apache.beam.sdk.transforms.Create;
public class S3Test {
public static void main(String[] args) {
PipelineOptions options = PipelineOptionsFactory.fromArgs(args).withValidation().create();
Pipeline pipeline = Pipeline.create(options);
ResourceId outputDir = FileSystems.matchNewResource("s3://my-bucket/temp/", true);
pipeline
.apply("New",
Create.of("Hello World!!")
)
.apply(
"Write to S3",
TextIO.write()
.to("s3://my-bucket/test.txt")
.withoutSharding()
.withTempDirectory(outputDir)
);
pipeline.run();
}
}
This works to some extent, in that it first creates a temporary file on my S3 bucket. However, when it tries to rename it to my final desired/input filename, it throws the following error.
Receiver class org.apache.beam.sdk.io.aws.s3.S3FileSystem does not define or inherit an implementation of the resolved method 'abstract void rename(java.util.List, java.util.List, org.apache.beam.sdk.io.fs.MoveOptions[])' of abstract class org.apache.beam.sdk.io.FileSystem.
I guess I have to override or implement a rename() function, but not sure where I'd start on that.
Has anyone tried to write successfully to S3 using TextIO.write()?
EDIT:
Below is the full message in the Run window:
Dec 24, 2022 4:56:34 PM org.apache.beam.sdk.io.WriteFiles$WriteShardsIntoTempFilesFn processElement INFO: Opening writer 8a0dbf70-a0fc-48dd-92f9-8ff4000a8076 for window org.apache.beam.sdk.transforms.windowing.GlobalWindow@2dddc1b9 pane PaneInfo{isFirst=true, isLast=true, timing=ON_TIME, index=0, onTimeIndex=0} destination null Dec 24, 2022 4:56:37 PM org.apache.beam.sdk.io.FileBasedSink$Writer close INFO: Successfully wrote temporary file s3://my-bucket/temp/.temp-beam-13da7e3e-15a9-4285-a90c-30c7c6dcfe99/9e53ec6a8a0dbf70-a0fc-48dd-92f9-8ff4000a8076 Dec 24, 2022 4:56:37 PM org.apache.beam.sdk.io.WriteFiles$FinalizeTempFileBundles$FinalizeFn process INFO: Finalizing 1 file results Dec 24, 2022 4:56:37 PM org.apache.beam.sdk.io.FileBasedSink$WriteOperation createMissingEmptyShards INFO: Finalizing for destination null num shards 1. Dec 24, 2022 4:56:37 PM org.apache.beam.sdk.io.FileBasedSink$WriteOperation moveToOutputFiles INFO: Will copy temporary file FileResult{tempFilename=s3://my- bucket/temp/.temp-beam-13da7e3e-15a9-4285-a90c-30c7c6dcfe99/9e53ec6a8a0dbf70-a0fc-48dd-92f9-8ff4000a8076, shard=0, window=org.apache.beam.sdk.transforms.windowing.GlobalWindow@2dddc1b9, paneInfo=PaneInfo{isFirst=true, isLast=true, timing=ON_TIME, index=0, onTimeIndex=0}} to final location s3://my-bucket/test.txt Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: java.lang.AbstractMethodError: Receiver class org.apache.beam.sdk.io.aws.s3.S3FileSystem does not define or inherit an implementation of the resolved method 'abstract void rename(java.util.List, java.util.List, org.apache.beam.sdk.io.fs.MoveOptions[])' of abstract class org.apache.beam.sdk.io.FileSystem. at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:374) at org.apache.beam.runners.direct.DirectRunner$DirectPipelineResult.waitUntilFinish(DirectRunner.java:342) at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:218) at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:67) at org.apache.beam.sdk.Pipeline.run(Pipeline.java:323) at org.apache.beam.sdk.Pipeline.run(Pipeline.java:309) at com.example.S3Test.main(S3Test.java:255) Caused by: java.lang.AbstractMethodError: Receiver class org.apache.beam.sdk.io.aws.s3.S3FileSystem does not define or inherit an implementation of the resolved method 'abstract void rename(java.util.List, java.util.List, org.apache.beam.sdk.io.fs.MoveOptions[])' of abstract class org.apache.beam.sdk.io.FileSystem. at org.apache.beam.sdk.io.FileSystems.renameInternal(FileSystems.java:323) at org.apache.beam.sdk.io.FileSystems.rename(FileSystems.java:308) at org.apache.beam.sdk.io.FileBasedSink$WriteOperation.moveToOutputFiles(FileBasedSink.java:802) at org.apache.beam.sdk.io.WriteFiles$FinalizeTempFileBundles$FinalizeFn.process(WriteFiles.java:1077)
Try updating your pipeline... by default withNumShards() is set to zero, so it will write to a single file
pipeline
.apply("New",
Create.of("Hello World!!")
)
.apply(
"Write to S3",
TextIO.write()
.to("s3://my-bucket/test.txt")
.withNumShards(1)
.withTempDirectory(outputDir)
);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.