简体   繁体   中英

Big Query job fails with "Bad character (ASCII 0) encountered."

I have a job that is failing with the error

Line:14222274 / Field:1, Bad character (ASCII 0) encountered. Rest of file not processed.

The data is compressed and I have verified that no ASCII 0 character exists in the file. There are only 14222273 lines in the file, so the line number that is printed in the error message is one line past the end of the file. I have other chunks from the same data set which have uploaded successfully, so I suspect that this is either a BQ bug, or the error message is not indicative of the underlying issue. Any help solving this problem would be appreciated. Thanks.

>>> data = open("data.csv").read()
>>> chr(0) in data
False
>>> data[-1]
'\n'

I had similar problems, trying to load in BigQuery a compressed file (saved it in Google Cloud Storage). These are the logs:

File: 0 / Offset:4563403089 / Line:328480 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid)
File: 0 / Offset:4563403089 / Line:328485 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid)
File: 0 / Offset:4563403089 / Line:328490 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid)
File: 0 / Offset:4563403089 / Line:328511 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid)
File: 0 / Offset:4563403089 / Line:328517 / Field:21: Bad character (ASCII 0) encountered: field starts with:  (error code: invalid)

For resolve the problem what I've done is remove the ASCII 0 characters from the compressed file. To do it, I executed the following commnad from an instance of Compute Engine with the sdk installed:

By using pipelines, I avoid having to have all storage on the hard disk (1G compress + 52G uncompress). The first program gets the compressed file from Storage, the second decompresses it, the thrid removes the ASCII 0 characters and the fourth program updaloads the result to Storage.

I don't compress the result when I upload again to Storage, because for BigQuery is faster load a uncompressed file. After that I can load on BigQuery the data without problems.

When you compress what utility did you use?.

I saw this issue when i compressed my csv file in ZIP format ( in windows) . Google BigQuery seems to accept only gzip format.

Make sure to compress your CSV using gzip. If you are in windows 7-zip is a great utility which allows you to compress in gzip.

In Unix gzip is standard.

Bad character (ASCII 0) encountered. Rest of file not processed.

Clearly states you have a UTF-16 character there which cannot be decoded. BigQuery service only supports UTF-8 and latin1 text encodings. So, the file is supposed to be UTF-8 encoded.

There are only 14222273 lines in the file, so the line number that is printed in the error message is one line past the end of the file.

Probably you have a UTF-16 encoded tab character at the end of the file, which cannot be decoded.


Solution : Use the -a or --ascii flag with gzip command. It'll be decoded ok by bigquery .

I have the same problem. I get this error message trying to upload a data file: Failed to create table: Error while reading data, error message: Error detected while parsing row starting at position: 0. Error: Bad character (ASCII 0) encountered.

I am using a mac book and this steps help: Open the data file you wanted to upload to bigquery. Go to File -- Export to -- csv -- click ' next'. Now, try to upload it to bigQuery. SUCCESS!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM