简体   繁体   English

将谷歌云存储中的json文件加载到大查询表中

[英]load json files in google cloud storage into big query table

I am trying to do it with client lib using python.我正在尝试使用 python 使用客户端库来完成它。

the problem I am facing is that the TIMESTAMP on the JSON files are on Unix epoch TIMESTAMP format and big query can't detect that:我面临的问题是 JSON 文件上的 TIMESTAMP 是 Unix epoch TIMESTAMP 格式,大查询无法检测到:

according to documentation:根据文档:

在此处输入图片说明

so I wonder what to do?所以我想知道该怎么做?

I thought about changing the JSON format manually before I load it into BigQuery table?我想过在将 JSON 格式加载到 BigQuery 表之前手动更改它吗?

Or maybe looking for an auto conversion from the BigQuery side?或者可能正在寻找 BigQuery 方面的自动转换?

I wondered across the internet and could not find anything useful yet.我在互联网上想知道,但找不到任何有用的东西。

Thanks in advance for any support.在此先感谢您的任何支持。

You have 2 solutions您有 2 个解决方案

  • Either you update the format before the BigQuery integration您要么在 BigQuery 集成之前更新格式
  • Or you update the format after the BigQuery integration或者您在 BigQuery 集成后更新格式

Before

Before means updating your JSON (manually or by script) or to update it by the process that load the JSON into BigQuery (like Dataflow).之前意味着更新您的 JSON(手动或通过脚本)或通过将 JSON 加载到 BigQuery 的过程(如 Dataflow)更新它。

I personally don't like this, file handling are never funny and efficient.我个人不喜欢这样,文件处理从来都不是有趣和高效的。

After

In this case, you let BigQuery loading your JSON file into a temporary table and convert your UNIX timestamp into a Number or a String.在这种情况下,您让 BigQuery 将您的 JSON 文件加载到临时表中,并将您的 UNIX 时间戳转换为数字或字符串。 Then, perform a request into this temporary table, convert the field in the correct timestamp format, and insert the data in the final table.然后,对这个临时表执行请求,将字段转换为正确的时间戳格式,并将数据插入到最终表中。

This way is smoother and easier (a simple SQL query to write).这种方式更顺畅、更容易(编写一个简单的 SQL 查询)。 However, it implies cost to read all the loaded data (to write them then)但是,这意味着读取所有加载的数据(然后写入它们)的成本

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM