简体繁体 English

使用DataFlow（Apache Beam）将ISO-8859-1加载到BigQuery中时出现问题

[英]Problem loading ISO-8859-1 into BigQuery using DataFlow (Apache Beam)

原文 2019-07-23 06:46:33 4 1 java/ google-cloud-dataflow/ apache-beam

I'm trying to load an ISO-8859-1 file into BigQuery using DataFlow. 我正在尝试使用DataFlow将ISO-8859-1文件加载到BigQuery中。 I've built a template with Apache Beam Java. 我已经使用Apache Beam Java构建了一个模板。 Everything works well but when I check the content of the Bigquery table I see that some characters like 'ñ' or accents 'á','é', etc. haven't been stored propertly, they have been stored as . 一切正常，但是当我检查Bigquery表的内容时，我发现未正确存储某些字符（如'ñ'或重音符号'á'，'é'等），它们已存储为``。

I've tried several charset changing before write into BigQuery. 在写入BigQuery之前，我尝试过几种字符集更改。 Also, I've created a special ISOCoder passed to the pipeline using the method setCoder(), but nothing works. 另外，我使用setCoder（）方法创建了一个特殊的ISOCoder传递给管道，但是没有任何效果。

Does anyone know if is it possible to load into BigQuery this kind of files using Apache Beam? 有谁知道是否可以使用Apache Beam将此类文件加载到BigQuery中？ Only UTF-8? 只有UTF-8？

Thanks in advance for your help. 在此先感谢您的帮助。

1 个解决方案

This feature is currently not available in the Java SDK of Beam. Beam的Java SDK当前不提供此功能。 In Python this seems to be possible by using the additional_bq_parameters when using WriteToBigQuery , see: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L177 在Python这似乎是可以通过使用additional_bq_parameters使用时WriteToBigQuery ，请参阅： https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L177