简体   繁体   中英

Apache beam write transform writes into multiple files?

I was looking at the wordCount example from Apache Beam and when I tried to run this example in local, it wrote the counts into multiple files. I created a test project to read and write data from a file and even that write operation wrote the output in to multiple files. How do I get the result in just a single file? I am using direct runner

That is happening for performance reasons. You should be able to force a single file by using TextIO.Write.withoutSharding

withoutSharding

public TextIO.Write withoutSharding()

Forces a single file as output and empty shard name template. This option is only compatible with unwindowed writes.

For unwindowed writes, constraining the number of shards is likely to reduce the performance of a pipeline. Setting this value is not recommended unless you require a specific number of output files.

This is equivalent to .withNumShards(1).withShardNameTemplate("")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM