如何將json從雲存儲上的文件導入Bigquery

Question

我試圖通過api將文件（json.txt）從雲存儲導入Bigquery並拋出錯誤。 當通過web ui完成時，它可以工作並且沒有錯誤（我甚至設置了maxBadRecords = 0）。 有人可以告訴我我在這里做錯了什么嗎？ 代碼是錯誤的，還是我需要在某個地方更改Bigquery中的某些設置？

該文件是一個純文本utf-8文件，內容如下：我保留了關於bigquery和json導入的文檔。

{"person_id":225,"person_name":"John","object_id":1}
{"person_id":226,"person_name":"John","object_id":1}
{"person_id":227,"person_name":"John","object_id":null}
{"person_id":229,"person_name":"John","object_id":1}

並在導入作業時拋出以下錯誤：“值無法轉換為預期類型。” 每一行。

    {
    "reason": "invalid",
    "location": "Line:15 / Field:1",
    "message": "Value cannot be converted to expected type."
   },
   {
    "reason": "invalid",
    "location": "Line:16 / Field:1",
    "message": "Value cannot be converted to expected type."
   },
   {
    "reason": "invalid",
    "location": "Line:17 / Field:1",
    "message": "Value cannot be converted to expected type."
   },
  {
    "reason": "invalid",
    "location": "Line:18 / Field:1",
    "message": "Value cannot be converted to expected type."
   },
   {
    "reason": "invalid",
    "message": "Too many errors encountered. Limit is: 10."
   }
  ]
 },
 "statistics": {
  "creationTime": "1384484132723",
  "startTime": "1384484142972",
  "endTime": "1384484182520",
  "load": {
   "inputFiles": "1",
   "inputFileBytes": "960",
   "outputRows": "0",
   "outputBytes": "0"
  }
 }
}

該文件可在此處訪問： http ： //www.sendspace.com/file/7q0o37

我的代碼和架構如下：

def insert_and_import_table_in_dataset(tar_file, table, dataset=DATASET)
config= {
  'configuration'=> {
      'load'=> {
        'sourceUris'=> ["gs://test-bucket/#{tar_file}"],
        'schema'=> {
          'fields'=> [
            { 'name'=>'person_id', 'type'=>'INTEGER', 'mode'=> 'nullable'},
            { 'name'=>'person_name', 'type'=>'STRING', 'mode'=> 'nullable'},
            { 'name'=>'object_id',  'type'=>'INTEGER', 'mode'=> 'nullable'}
          ]
        },
        'destinationTable'=> {
          'projectId'=> @project_id.to_s,
          'datasetId'=> dataset,
          'tableId'=> table
        },
        'sourceFormat' => 'NEWLINE_DELIMITED_JSON',
        'createDisposition' => 'CREATE_IF_NEEDED',
        'maxBadRecords'=> 10,
      }
    },
  }

result = @client.execute(
  :api_method=> @bigquery.jobs.insert,
  :parameters=> {
     #'uploadType' => 'resumable',          
      :projectId=> @project_id.to_s,
      :datasetId=> dataset},
  :body_object=> config
)

# upload = result.resumable_upload
# @client.execute(upload) if upload.resumable?

puts result.response.body
json = JSON.parse(result.response.body)    
while true
  job_status = get_job_status(json['jobReference']['jobId'])
  if job_status['status']['state'] == 'DONE'
    puts "DONE"
    return true
  else
   puts job_status['status']['state']
   puts job_status 
   sleep 5
  end
end
end

有人可以告訴我我做錯了什么嗎？ 我該修復什么，在哪里？

此外，在未來的某個時刻，我希望使用壓縮文件並從中導入 - 這是“tar.gz”還是可以，或者我只需要將其設為“.gz”嗎？

提前感謝您的幫助。 欣賞它。

Answer 1

很多人（包括我）都受到了同樣的打擊 - 你正在導入一個json文件但沒有指定導入格式，所以它默認為csv。

如果你將configuration.load.sourceFormat設置為NEWLINE_DELIMITED_JSON，你應該很高興。

我們有一個錯誤，使其更難做或至少能夠檢測文件何時是錯誤的類型，但我會優先考慮。

如何將json從雲存儲上的文件導入Bigquery

問題描述

1 個解決方案

解決方案1
3 已采納 2013-11-15 22:57:08

如何將json從雲存儲上的文件導入Bigquery

問題描述

1 個解決方案

解決方案1 3 已采納 2013-11-15 22:57:08

解決方案1
3 已采納 2013-11-15 22:57:08