简体   繁体   English

使用 Python 加载表时,BigQuery 不会跳过 CSV 的 header 行

[英]BigQuery won't skip header row of CSV when loading a table with Python

I am using Python 3.8 to load a csv file into Big Query as a new table, I have the schema defined, auto detect off, and skip_leading_rows =1.我正在使用 Python 3.8 将 csv 文件作为新表加载到 Big Query 中,我定义了架构,自动检测关闭,并且 skip_leading_rows =1。 When I run the file, I get the following error:当我运行该文件时,我收到以下错误:

BadRequest: 400 Error while reading data, error message: Could not parse 'Dollar Sales' as DOUBLE for field Dollar_Sales (position 13) starting at location 3913004 with message 'Unable to parse' BadRequest:读取数据时出现 400 错误,错误消息:无法将字段 Dollar_Sales(位置 13)的“美元销售额”解析为 DOUBLE,从位置 3913004 开始,消息“无法解析”

My code looks like this:我的代码如下所示:

dataset_ref = client.dataset(dataset_id)
table_ref = dataset_ref.table(table_id)
job_config = bigquery.LoadJobConfig()
job_config.source_format = bigquery.SourceFormat.CSV
job_config.skip_leading_rows = 1
job_config.autodetect = True
job_config.schema = [
     bigquery.SchemaField("Time", "STRING"),
     bigquery.SchemaField("product", "STRING"),
     bigquery.SchemaField("UPC_13_digit", "STRING"),
     bigquery.SchemaField("Brand_Name", "STRING"),
     bigquery.SchemaField("TGT_Private_Label_National_Brand_Value", "STRING"),
     bigquery.SchemaField("Product_Type_Group", "STRING"),
     bigquery.SchemaField("Primary_Package_Group", "STRING"),
     bigquery.SchemaField("Aisle_Name", "STRING"),
     bigquery.SchemaField("TGT_DPCI_Value", "STRING"),
     bigquery.SchemaField("TGT_Class_Value", "STRING"),
     bigquery.SchemaField("TGT_Subclass_Value", "STRING"),
     bigquery.SchemaField("TGT_Major_Brand_Value", "STRING"),
     bigquery.SchemaField("TGT_All_Brands_Value", "STRING"),
     bigquery.SchemaField("Dollar_Sales", "FLOAT")
]

with open(localfilename, "rb") as source_file:
    job = client.load_table_from_file(source_file, table_ref, job_config=job_config)

job.result()

print("Loaded {} rows into {}:{}.".format(job.output_rows, dataset_id, table_id))

It gives an error at the "Dollar_Sales" column, so I assume it's not actually skipping the header row which is why it can't parse the header "Dollar Sales" because it's a string?它在“Dollar_Sales”列出现错误,所以我认为它实际上并没有跳过 header 行,这就是为什么它无法解析 header “Dollar Sales”,因为它是一个字符串? When I tested turning auto detect on and not defining the schema, it still included the header row and my entire table was strings.当我测试打开自动检测而不定义模式时,它仍然包括 header 行,我的整个表都是字符串。 Any ideas on why the leading row is not skipped?关于为什么不跳过第一行的任何想法? Also, I'm confused about "location 3913004" in the error message, as my csv only has about 39k rows.另外,我对错误消息中的“位置 3913004”感到困惑,因为我的 csv 只有大约 39k 行。 Thanks谢谢

EDIT: I should mention that the values in the "Dollar Sales" column of the CSV I am loading in are numerical and I need to keep it as such.编辑:我应该提到,我正在加载的 CSV 的“美元销售额”列中的值是数字,我需要保留它。

You said: " it can't parse the header "Dollar Sales" because it's a string"您说:“它无法解析 header “Dollar Sales”,因为它是一个字符串”

However, the program says: "Could not parse 'Dollar Sales' as DOUBLE "但是,该程序说:“无法将 'Dollar Sales' 解析为 DOUBLE ”

This means its a datatype issue, a DOUBLE is a number type, you even told it that it was a FLOAT type.这意味着它是一个数据类型问题,一个 DOUBLE 是一个数字类型,你甚至告诉它它是一个 FLOAT 类型。 If it needs to be STRING set it to that instead perhaps?如果需要将其设置为 STRING,也许? I am assuming that it is suppose to be a numerical value since its "Dollar Sales" but keep in mind programs will freak out if you try to use one datatype when its expecting another, can't add a string plus a number for example.我假设它应该是一个数值,因为它的“美元销售额”但请记住,如果您在期望另一种数据类型时尝试使用一种数据类型,程序会发疯,例如,不能添加字符串加数字。 So if it is throwing an error like this, a variable is of the wrong type somewhere or where its going doesn't accept that type.因此,如果它抛出这样的错误,则变量在某处的类型错误,或者它的去向不接受该类型。

EDIT - I just noticed you left off a ] close bracket, I don't know enough about Python to know if that is necessary or if you cut it out or other code on purpose or accident, but perhaps if its not a datatype issue its a termination issue?编辑 - 我刚刚注意到你离开了 ] 右括号,我对 Python 了解不够,不知道这是否有必要,或者你是否故意或意外将其删除或其他代码,但也许如果它不是数据类型问题终止问题?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM