简体   繁体   English

引用变量 'ro_sub_ros.$is_not_null' 的级别为 1,而 Parquet 列对应的字段路径有 0 个重复字段

[英]Referenced variable 'ro_sub_ros.$is_not_null' has levels of 1, while the corresponding field path to Parquet column has 0 repeated fields

BigQuery Python: google.api_core.exceptions.BadRequest: 400 Error while reading data, error message: Schema mismatch: referenced variable 'ro_sub_ros.$is_not_null' has array levels of 1, while the corresponding field path to Parquet column has 0 repeated fields. BigQuery Python: google.api_core.exceptions.BadRequest: 400 Error while reading data, error message: Schema mismatch: referenced variable 'ro_sub_ros.$is_not_null' has array levels of 1, while the corresponding field path to Parquet 列有0个重复字段。

My original data looks like this:我的原始数据如下所示:

testData = {
    "ro_user_email": "tech@techietech.com",
    "ro_account_id": "23402042",
    "ro_sub_account_id": "34020334",
    "ro_name": "Test RO",
    "ro_number": "1304340",
    "ro_currency": {"label":"USD","value":"USD"},
    "ro_dates": {"from":now,"to":now},
    "ro_status": "draft",
    "ro_operation_timestamp": pd.Timestamp(now),
    "ro_billing_cycle": {"label":"Fortnightly","value":"Fortnightly"},
    "ro_sub_ros": [
        {
            "sub_ro_id": "2323",
            "valid":False,
            "sub_ro_name": "Testing",
            "sub_ro_dates":{"from":now,"to":now},
            "sub_ro_budget": 1203302.22,
            "sub_ro_revenue_price":1202302.22,
            "sub_ro_revenue_selected": {"label":"Fortnightly","value":"Fortnightly"},
            "sub_ro_revenue_model_selected": {"label":"Fortnightly","value":"Fortnightly"},
            "sub_ro_campaigns_selected": [{"label":"Fortnightly","value":"Fortnightly"}],
            "sub_ro_ios_selected": [{"label":"Fortnightly","value":"Fortnightly"}],
            "sub_ro_client_id": [{"label":"Fortnightly","value":"Fortnightly"}],
            "sub_ro_ids_selected": [{"label":"Fortnightly","value":"Fortnightly"}],
            "sub_ro_pixels_selected": [{"label":"Fortnightly","value":"Fortnightly"}],
            "kpi_1_metric_selected": {"label":"Fortnightly","value":"Fortnightly"},
            "attribution_model_selected": {"label":"Fortnightly","value":"Fortnightly"},
            "kpi_window_selected": {"label":"Fortnightly","value":"Fortnightly"},
            "deepMetrics_selected": {"label":"Fortnightly","value":"Fortnightly"},
            "sub_ro_kpi_goal":"ROI"

        }
    ],

}

And here's how I created my BQ Schema:这是我创建 BQ 模式的方式:

schema = [
        bigquery.SchemaField("ro_user_email", "STRING", mode="REQUIRED"),
        bigquery.SchemaField("ro_account_id", "STRING", mode="REQUIRED"),
        bigquery.SchemaField("ro_sub_account_id", "STRING", mode="REQUIRED"),
        bigquery.SchemaField("ro_name", "STRING", mode="REQUIRED"),
        bigquery.SchemaField("ro_number", "STRING", mode="REQUIRED"),
        bigquery.SchemaField("ro_currency", 
        "STRUCT", 
        mode="REQUIRED",
        fields=[
            bigquery.SchemaField("label", "STRING", mode="REQUIRED"),
            bigquery.SchemaField("value", "STRING", mode="REQUIRED"),
            ]
            ),

        bigquery.SchemaField("ro_dates", 
        "STRUCT", 
        mode="REQUIRED",
        fields=[
            bigquery.SchemaField("from", "DATE", mode="REQUIRED"),
            bigquery.SchemaField("to", "DATE", mode="REQUIRED"),
            ]
            ),

        bigquery.SchemaField("ro_status","STRING", mode="REQUIRED"),
        bigquery.SchemaField("ro_operation_timestamp","TIMESTAMP", mode="REQUIRED"),

        bigquery.SchemaField("ro_billing_cycle", 
        "STRUCT", 
        mode="REQUIRED",
        fields=[
            bigquery.SchemaField("label", "STRING", mode="REQUIRED"),
            bigquery.SchemaField("value", "STRING", mode="REQUIRED"),
            ]
            ),
        
        bigquery.SchemaField(
        "ro_sub_ros",
        "RECORD",
        mode="REPEATED",
        fields=[
            bigquery.SchemaField("sub_ro_id", "STRING", mode="REQUIRED"),
            bigquery.SchemaField("valid", "BOOL", mode="REQUIRED"),
            bigquery.SchemaField("sub_ro_name", "STRING", mode="REQUIRED"),
            bigquery.SchemaField("sub_ro_dates", "STRUCT", mode="REQUIRED",
            fields=[
                bigquery.SchemaField("from", "DATE", mode="REQUIRED"),
                bigquery.SchemaField("to", "DATE", mode="REQUIRED"),
                ]
                ),
            bigquery.SchemaField("sub_ro_budget", "FLOAT", mode="REQUIRED"),
            bigquery.SchemaField("sub_ro_revenue_price", "FLOAT", mode="REQUIRED"),
            bigquery.SchemaField("sub_ro_revenue_selected",
            "STRUCT", mode="REQUIRED",
            fields=[
                bigquery.SchemaField("label", "STRING", mode="REQUIRED"),
                bigquery.SchemaField("value", "STRING", mode="REQUIRED"),

            ]
            ),
            bigquery.SchemaField("sub_ro_revenue_model_selected",
            "STRUCT", mode="REQUIRED",
            fields=[
                bigquery.SchemaField("label", "STRING", mode="REQUIRED"),
                bigquery.SchemaField("value", "STRING", mode="REQUIRED"),

            ]
            ),
            bigquery.SchemaField("sub_ro_campaigns_selected","RECORD",
            mode="REPEATED",
            fields=[
                bigquery.SchemaField("model_list",
            "STRUCT", mode="REQUIRED",
            fields=[
                bigquery.SchemaField("label", "STRING", mode="REQUIRED"),
                bigquery.SchemaField("value", "STRING", mode="REQUIRED"),
            ]
            )
            ]),

            bigquery.SchemaField("sub_ro_ios_selected","RECORD",
            mode="REPEATED",
            fields=[
                bigquery.SchemaField("model_list",
            "STRUCT", mode="REQUIRED",
            fields=[
                bigquery.SchemaField("label", "STRING", mode="REQUIRED"),
                bigquery.SchemaField("value", "STRING", mode="REQUIRED"),
            ]
            )
            ]),

            bigquery.SchemaField("sub_ro_client_id","RECORD",
            mode="REPEATED",
            fields=[
                bigquery.SchemaField("model_list",
            "STRUCT", mode="REQUIRED",
            fields=[
                bigquery.SchemaField("label", "STRING", mode="REQUIRED"),
                bigquery.SchemaField("value", "STRING", mode="REQUIRED"),
            ]
            )
            ]),

            #

            bigquery.SchemaField("sub_ro_ids_selected","RECORD",
            mode="REPEATED",
            fields=[
                bigquery.SchemaField("model_list",
            "STRUCT", mode="REQUIRED",
            fields=[
                bigquery.SchemaField("label", "STRING", mode="REQUIRED"),
                bigquery.SchemaField("value", "STRING", mode="REQUIRED"),
            ]
            )
            ]),

            bigquery.SchemaField("sub_ro_pixels_selected","RECORD",
            mode="REPEATED",
            fields=[
                bigquery.SchemaField("model_list",
            "STRUCT", mode="REQUIRED",
            fields=[
                bigquery.SchemaField("label", "STRING", mode="REQUIRED"),
                bigquery.SchemaField("value", "STRING", mode="REQUIRED"),
            ]
            )
            ]),

            bigquery.SchemaField("kpi_1_metric_selected",
            "STRUCT", mode="REQUIRED",
            fields=[
                bigquery.SchemaField("label", "STRING", mode="REQUIRED"),
                bigquery.SchemaField("value", "STRING", mode="REQUIRED"),

            ]
            ),

            bigquery.SchemaField("attribution_model_selected",
            "STRUCT", mode="REQUIRED",
            fields=[
                bigquery.SchemaField("label", "STRING", mode="REQUIRED"),
                bigquery.SchemaField("value", "STRING", mode="REQUIRED"),

            ]
            ),

            bigquery.SchemaField("kpi_window_selected",
            "STRUCT", mode="REQUIRED",
            fields=[
                bigquery.SchemaField("label", "STRING", mode="REQUIRED"),
                bigquery.SchemaField("value", "STRING", mode="REQUIRED"),

            ]
            ),

            bigquery.SchemaField("deepMetrics_selected",
            "STRUCT", mode="REQUIRED",
            fields=[
                bigquery.SchemaField("label", "STRING", mode="REQUIRED"),
                bigquery.SchemaField("value", "STRING", mode="REQUIRED"),

            ]
            ),

            bigquery.SchemaField("sub_ro_kpi_goal", "STRING", mode="REQUIRED"),




        ],
    )
    ]

When I try to upload this data using bigquery client library I get this error:当我尝试使用bigquery client library上传此数据时,出现此错误:

job_config = bigquery.LoadJobConfig(schema=schema)
    return bq.client.load_table_from_dataframe(
        df, tablename, job_config=job_config
    ).result()

throws:投掷:

google.api_core.exceptions.BadRequest: 400 Error while reading data, error message: Schema mismatch: referenced variable 'ro_sub_ros.$is_not_null' has array levels of 1, while the corresponding field path to Parquet column has 0 repeated 
fields.

Not sure what's going wrong here, In case my schema is too big and bulky to analyze, can someone show a minimal example of uploading a REPEATED RECORD in google bigquery using client library and pandas data frame?不确定这里出了什么问题,如果我的模式太大太笨重而无法分析,有人可以展示一个使用客户端库和REPEATED RECORD数据框在 google bigquery 中上传重复记录的最小示例吗?

you can consider validating these options.您可以考虑验证这些选项。

Validate the BigQuery schema is correct, this is an example, using repeating records.验证 BigQuery 架构是否正确,这是一个使用重复记录的示例。 You can see official documentation .可以看官方文档

# from google.cloud import bigquery
# client = bigquery.Client()
# project = client.project
# dataset_ref = bigquery.DatasetReference(project, 'my_dataset')
 
schema = [
    bigquery.SchemaField("id", "STRING", mode="NULLABLE"),
    bigquery.SchemaField("first_name", "STRING", mode="NULLABLE"),
    bigquery.SchemaField("last_name", "STRING", mode="NULLABLE"),
    bigquery.SchemaField("dob", "DATE", mode="NULLABLE"),
    bigquery.SchemaField(
        "addresses",
        "RECORD",
        mode="REPEATED",
        fields=[
            bigquery.SchemaField("status", "STRING", mode="NULLABLE"),
            bigquery.SchemaField("address", "STRING", mode="NULLABLE"),
            bigquery.SchemaField("city", "STRING", mode="NULLABLE"),
            bigquery.SchemaField("state", "STRING", mode="NULLABLE"),
            bigquery.SchemaField("zip", "STRING", mode="NULLABLE"),
            bigquery.SchemaField("numberOfYears", "STRING", mode="NULLABLE"),
        ],
    ),
]
table_ref = dataset_ref.table("my_table")
table = bigquery.Table(table_ref, schema=schema)
table = client.create_table(table)  # API request
print("Created table {}".format(table.full_table_id))

Validate the records syntax is correct.验证记录语法是否正确。 Here is an example with values of the schema. 是一个包含架构值的示例。

{"id":"1","first_name":"John","last_name":"Doe","dob":"1968-01-22","addresses":[{"status":"current","address":"123 First Avenue","city":"Seattle","state":"WA","zip":"11111","numberOfYears":"1"},{"status":"previous","address":"456 Main Street","city":"Portland","state":"OR","zip":"22222","numberOfYears":"5"}]}

{"id":"2","first_name":"Jane","last_name":"Doe","dob":"1980-10-16","addresses":[{"status":"current","address":"789 Any Avenue","city":"New York","state":"NY","zip":"33333","numberOfYears":"2"},{"status":"previous","address":"321 Main Street","city":"Hoboken","state":"NJ","zip":"44444","numberOfYears":"3"}]}

Consider using the “autodetect schema” in your python code.考虑在您的 python 代码中使用“自动检测架构”。 Similar to this example.类似于这个例子。 You can see more documentation .您可以查看更多文档

from google.cloud import bigquery

# Construct a BigQuery client object.
client = bigquery.Client()

# TODO(developer): Set table_id to the ID of the table to create.
# table_id = "your-project.your_dataset.your_table_name

# Set the encryption key to use for the destination.
# TODO: Replace this key with a key you have created in KMS.
# kms_key_name = "projects/{}/locations/{}/keyRings/{}/cryptoKeys/{}".format(
#     "cloud-samples-tests", "us", "test", "test"
# )
job_config = bigquery.LoadJobConfig(
    autodetect=True, source_format=bigquery.SourceFormat.NEWLINE_DELIMITED_JSON
)
uri = "gs://cloud-samples-data/bigquery/us-states/us-states.json"
load_job = client.load_table_from_uri(
    uri, table_id, job_config=job_config
)  # Make an API request.
load_job.result()  # Waits for the job to complete.
destination_table = client.get_table(table_id)
print("Loaded {} rows.".format(destination_table.num_rows))

You can validate the JSON format in this page .您可以在此页面验证 JSON 格式。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 当 CSV 在 header 列中有空间时,ADF 复制活动失败 CSV 到 Parquet - ADF Copy Activity Fails CSV to Parquet when CSV has space in header column - 没有相应 getter 的类型“GoogleServicesTask”字段“intermediateDir”已使用 @OutputDirectory 进行注释 - - Type 'GoogleServicesTask' field 'intermediateDir' without corresponding getter has been annotated with @OutputDirectory kotlin whereNotEqualTo 不适用于 firestore 中具有 null 值的字段 - kotlin whereNotEqualTo doesn't work for a field that has null value in firestore 如何展平由 BigQuery 中重复字段组成的 RECORD 字段? - How to flatten my RECORD field that is made of repeated fields in BigQuery? 如何将数据从一个表复制到另一个表中,该表在 GCP Bigquery 中有一个记录重复列 - How to copy data from one table into another table which has a record repeated column in GCP Bigquery AWS Amplify - Graphql + 数据存储:变量“输入”已强制为非空类型字符串的 Null 值 - AWS Amplify - Graphql + Datastore: Variable 'input' has coerced Null value for NonNull type String 如何从日期列中包含 Null 的表在 Google Bigquery 中查询? - How do I query in Google Bigquery from a table that has Null in a date column? AWS Lambda Rest API:此资源的同级 ({id}) 已有可变路径部分——只允许一个无法在路径处创建资源 - AWS Lambda Rest API: A sibling ({id}) of this resource already has a variable path part -- only one is allowed Unable to create resource at path 如何从字段名称具有特殊字符的 JSON 类型列中提取数据? - How to extract data from a JSON tye column which field name has special characters? 对 BQ 中的重复字段进行分组 - Grouping repeated fields in BQ
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM