[英]Apache beam dataflow Big query IO without schema
Is there any way to write unstructured data to a big query table using apache beam dataflow big query io API (ie without providing schema upfront)有没有办法使用 apache beam dataflow big query io API 将非结构化数据写入大查询表(即不预先提供架构)
Bigquery needs to know the schema when it creates the table, or when one writes to it. Bigquery 在创建表或写入表时需要知道架构。 Depending on your situation one may be able to dynamically determine the schema in the pipeline construction code rather than hard coding it.根据您的情况,可以动态确定管道构造代码中的架构,而不是对其进行硬编码。
CREATE IF NOT EXISTS `your_project.dataset.rawdata` (
raw STRING
);
You can store whatever data as a string without knowing the schema of it.您可以在不知道数据架构的情况下将任何数据存储为字符串。 For example, you can store a JSON data as a single string and a CSV as a string, etc.例如,您可以将 JSON 数据存储为单个字符串,将 CSV 存储为字符串等。
/**
* User-defined function (UDF) to transform events
* as part of a Dataflow template job.
*
* @param {string} inJson input Pub/Sub JSON message (stringified)
* @return {string} outJson output JSON message (stringified)
*/
function process(inJson) {
var obj = JSON.parse(inJson),
includePubsubMessage = obj.data && obj.attributes,
data = includePubsubMessage ? obj.data : obj;
// INSERT CUSTOM TRANSFORMATION LOGIC HERE
return JSON.stringify(obj);
}
you can see above sample UDF returns a JSON string.您可以看到上面的示例 UDF 返回一个 JSON 字符串。
SELECT JSON_VALUE(raw, '$.json_path_you_have') AS column1,
JSON_QUERY_ARRAY(raw, '$.json_path_you_have') AS column2,
...
FROM `your_project.dataset.rawdata`
Depending on your source data, you can use JSON functions or regular expressions to organize your data to a table with a schema you want.根据您的源数据,您可以使用 JSON 函数或正则表达式将数据组织到具有所需架构的表中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.