简体   繁体   中英

Using TextIO.Write with a complicated PCollection type in Google Cloud Dataflow

I have a PCollection that looks like this:

PCollection<KV<KV<String, EventSession>, Long>> windowed_counts

My goal is to write this out as a text file. I thought to use something like:

windowed_counts.apply( TextIO.Write.to( "output" ));

but am having a hard time getting the Coders setup correctly. This is what I thought would work:

    KvCoder kvcoder = KvCoder.of(KvCoder.of(StringUtf8Coder.of(), AvroDeterministicCoder.of(EventSession.class) ), TextualLongCoder.of());
    TextIO.Write.Bound io = TextIO.Write.withCoder( kvcoder );
    windowed_counts.apply( io.to( "output" ));

where TextualLongCoder is my own subclass of AtomicCoder, analogous to TextualIntegerCoder. The EventSession class is annotated to use AvroDeterministicCoder as it's default coder.

But with this I get garbled output that includes non-textual character, etc. Can anybody advice on how you would write this particular PCollection out as text? I'm sure there's something obvious I'm missing here...

Did you try creating a transform that will convert a PCollection of KV<KV<String, EventSession>, Long> to a PCollection of String s and then writing it into a text file?

I found it to be most flexible way for my needs

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM