简体   繁体   中英

Write to GCS using TextIO.write() from postgres with header

I am having a pipeline be run on GCP Dataflow where I read from an SQL instance and collect the data in a PCollection and then write that PCollection to a CSV file. It seems that while writing to CSV I cannot pass the header at Runtime (as a valueprovider) as given here the header has to be a string argument.

I have tried giving an empty string and updating the string in runtime, but it doesn't work. I take the first empty string as header only.

Is there any way that I can generate the header inside and have that string as header or if I can pass the header as a runtime argument?

Attaching the textio code below

String header = /*header*/;
PCollection<String> output = /*jdbc result*/;

output
    .apply(
        "Write File(s)",
        TextIO.write()
            .to(options.getFilePath())
            .withSuffix(".csv")
            .withHeader(header)
            .withShardNameTemplate("-S-of-N")
            .withTempDirectory(options.getTempDirectory()))

I don't understand the problem, I think you can pass a program argument as String:

--header=test

Options in Java code:

public interface MyOptions extends PipelineOptions {

    @Description("Header")
    String getHeader();

    void setHeader(String value);
}

Then pass it in the withHeader(header) method:

output
    .apply(
        "Write File(s)",
        TextIO.write()
            .to(options.getFilePath())
            .withSuffix(".csv")
            .withHeader(options.getHeader())
            .withShardNameTemplate("-S-of-N")
            .withTempDirectory(options.getTempDirectory()))

If you want, you can also configure the header outside in your code.

Currently withHeader is an argument that has to be specified at construction time, so it cannot be provided using PCollection element values.

You might be able to do this by breaking your pipeline into two pipelines, or generating/discovering the header value within your program from where the Beam pipeline is started.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM