[英]Write to GCS using TextIO.write() from postgres with header
I am having a pipeline be run on GCP Dataflow where I read from an SQL instance and collect the data in a PCollection and then write that PCollection to a CSV file.我在 GCP Dataflow 上运行了一个管道,我从 SQL 实例读取数据并将数据收集到 PCollection 中,然后将该 PCollection 写入 CSV 文件。 It seems that while writing to CSV I cannot pass the header at Runtime (as a valueprovider) as given here the header has to be a string argument.似乎在写入 CSV 时,我无法在运行时传递 header(作为值提供者),如此处给出的header 必须是字符串参数。
I have tried giving an empty string and updating the string in runtime, but it doesn't work.我试过给一个空字符串并在运行时更新字符串,但它不起作用。 I take the first empty string as header only.我只将第一个空字符串作为 header 。
Is there any way that I can generate the header inside and have that string as header or if I can pass the header as a runtime argument?有什么方法可以在内部生成 header 并将该字符串作为 header 或者如果我可以将 header 作为运行时参数传递?
Attaching the textio code below附上下面的textio代码
String header = /*header*/;
PCollection<String> output = /*jdbc result*/;
output
.apply(
"Write File(s)",
TextIO.write()
.to(options.getFilePath())
.withSuffix(".csv")
.withHeader(header)
.withShardNameTemplate("-S-of-N")
.withTempDirectory(options.getTempDirectory()))
I don't understand the problem, I think you can pass a program argument as String:我不明白这个问题,我认为您可以将程序参数作为字符串传递:
--header=test
Options in Java
code: Java
代码中的选项:
public interface MyOptions extends PipelineOptions {
@Description("Header")
String getHeader();
void setHeader(String value);
}
Then pass it in the withHeader(header)
method:然后在withHeader(header)
方法中传递它:
output
.apply(
"Write File(s)",
TextIO.write()
.to(options.getFilePath())
.withSuffix(".csv")
.withHeader(options.getHeader())
.withShardNameTemplate("-S-of-N")
.withTempDirectory(options.getTempDirectory()))
If you want, you can also configure the header
outside in your code.如果你愿意,你也可以在你的代码之外配置header
。
Currently withHeader
is an argument that has to be specified at construction time, so it cannot be provided using PCollection
element values.目前withHeader
是一个必须在构造时指定的参数,因此不能使用PCollection
元素值提供它。
You might be able to do this by breaking your pipeline into two pipelines, or generating/discovering the header value within your program from where the Beam pipeline is started.您可以通过将您的管道分成两个管道,或者在您的程序中从 Beam 管道开始的位置生成/发现 header 值来做到这一点。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.