简体   繁体   English

BigQuery:将数据从GCS加载到EU数据集中

[英]BigQuery: Load Data into EU Dataset from GCS

In the past I have successfully loaded data into US-hosted BigQuery datasets from CSV data in US-hosted GCS buckets. 过去,我已成功将数据从美国托管的GCS存储桶中的CSV数据加载到美国托管的BigQuery数据集中。 We since decided to move our BigQuery data to the EU and I created a new dataset with this region selected on it. 此后,我们决定将BigQuery数据移至EU,然后我创建了一个新的数据集,并在其中选择了该区域。 I have successfully populated those of our tables small enough to be uploaded from my machine at home. 我已经成功填充了那些表,这些表足够小,可以从家里的机器上载。 But two tables are far too large for this so I would like to load them from files in GCS. 但是两个表太大了,因此我想从GCS中的文件中加载它们。 I have tried doing this from both a US-hosted GCS bucket and an EU-hosted GCS bucket (thinking that bq load might not like to cross regions) but the load fails every time. 我曾尝试从美国托管的GCS存储桶和欧盟托管的GCS存储桶中执行此操作(认为bq负载可能不希望跨区域),但是负载每次都会失败。 Below is the error detail I'm getting from the bq command line (500, Internal Error). 以下是我从bq命令行获取的错误详细信息(500,内部错误)。 Does anyone know a reason why this might be happening? 有谁知道发生这种情况的原因? Is loading data into EU-hosted BigQuery datasets from GCS something that is known to work for others? 从GCS将数据加载到EU托管的BigQuery数据集中是否可以为其他人工作?

{
  "configuration": {
    "load": {
      "destinationTable": {
        "datasetId": "######", 
        "projectId": "######", 
        "tableId": "test"
      }, 
      "schema": {
        "fields": [
          {
            "name": "test_col", 
            "type": "INTEGER"
          }
        ]
      }, 
      "sourceFormat": "CSV", 
      "sourceUris": [
        "gs://######/test.csv"
      ]
    }
  }, 
  "etag": "######", 
  "id": "######", 
  "jobReference": {
    "jobId": "job_Y4U58uTyxitsvbgljFi2x534N7M", 
    "projectId": "######"
  }, 
  "kind": "bigquery#job", 
  "selfLink": "https://www.googleapis.com/bigquery/v2/projects/######", 
  "statistics": {
    "creationTime": "1445336673213", 
    "endTime": "1445336674738", 
    "startTime": "1445336674738"
  }, 
  "status": {
    "errorResult": {
      "message": "An internal error occurred and the request could not be completed.", 
      "reason": "internalError"
    }, 
    "errors": [
      {
        "message": "An internal error occurred and the request could not be completed.", 
        "reason": "internalError"
      }
    ], 
    "state": "DONE"
  }, 
  "user_email": "######"
}

After searching through other related questions on StackOverflow I eventually realised that I had set my GCS bucket region to EUROPE-WEST-1 and not the multi-region EU location. 通过在计算器上其他相关的问题,搜索后,我终于意识到,我已经把我的GCS斗区EUROPE-WEST-1而不是多区域EU位置。 Things are now working as expected. 现在一切都按预期进行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM