简体   繁体   中英

Problem loading ISO-8859-1 into BigQuery using DataFlow (Apache Beam)

I'm trying to load an ISO-8859-1 file into BigQuery using DataFlow. I've built a template with Apache Beam Java. Everything works well but when I check the content of the Bigquery table I see that some characters like 'ñ' or accents 'á','é', etc. haven't been stored propertly, they have been stored as .

I've tried several charset changing before write into BigQuery. Also, I've created a special ISOCoder passed to the pipeline using the method setCoder(), but nothing works.

Does anyone know if is it possible to load into BigQuery this kind of files using Apache Beam? Only UTF-8?

Thanks in advance for your help.

This feature is currently not available in the Java SDK of Beam. In Python this seems to be possible by using the additional_bq_parameters when using WriteToBigQuery , see: https://github.com/apache/beam/blob/master/sdks/python/apache_beam/io/gcp/bigquery.py#L177

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM