简体   繁体   English

我想用 BigQuery 提取 Json 格式的数据。 UDF 或 json_extract

[英]I want to extract Json format data with BigQuery. UDF or json_extract

I have a table with the following structure.我有一个具有以下结构的表。

user_id int,用户 ID 整数,

purchase_ids string(in Json format) purchase_ids string(Json格式)

The JSON contained in one record in this table looks like this:此表中的一条记录中包含的 JSON 如下所示:

user_id = 0001 user_id = 0001

1:{
  shop_id:1,
  product_id :1111,
  value: 1
},
2:{
  shop_id:1,
  product_id :2222,
  value: 1
},
3:{
  shop_id:1,
  product_id :3333,
  value: 1
},
.... Numbers fluctuate as records approach

Final output to aim for最终输出目标

| user_id | shop_id | product_id | value |
| 0001    | 1       |  1111      | 1     |
| 0001    | 1       |  2222      | 1     |
| 0001    | 1       |  3333      | 1     |

I tried the following query when I was thinking but it doesn't seem to be done right shop_id and product_id return null.我在思考时尝试了以下查询,但似乎没有正确完成 shop_id 和 product_id 返回 null。

CREATE TEMP FUNCTION jsonparse(json_row STRING)
  RETURNS STRING
  LANGUAGE js AS """
  var res = array();
  json_row.forEach(([key, value]) => {
     res = value;
  });
  return res
""";

with
parse as(
select
  user_id,
  jsonparse(purchase_ids) as pids
from
  sample
)


select
  user_id,
  JSON_EXTRAXT(pid,"$.shop_id") as shop_id,
  JSON_EXTRAXT(pid,"$.product_id") as product_id
from
  parse,
  unnest(pids,",") pid

How do you get it right in this situation?在这种情况下,您如何正确处理?

From my point of view, your use case needs to use a NESTED and REAPEATED column that can be represented with a json structure.从我的角度来看,您的用例需要使用可以用 json 结构表示 的 NESTED 和 REAPEATED 列 For example, the following query return the result you are looking for:例如,以下查询返回您要查找的结果:

WITH users AS
  (SELECT "0001" as user_id, ARRAY<STRUCT<shop_id INT64, product_id INT64, value INT64>>[(1, 1111,1),
    (1, 2222,1), (1, 3333,1)] AS shops)
SELECT u.user_id, s.*
FROM users u, UNNEST(shops) s;

For simplicity you can create this type of column from the Console to try this approach by following this guide .为简单起见,您可以从控制台创建这种类型的列,按照本指南尝试使用这种方法。

Below is the working version of your use case (BigQuery Standard SQL)以下是您的用例的工作版本(BigQuery 标准 SQL)

#standardSQL
CREATE TEMP FUNCTION jsonparse(input STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
  return JSON.parse(input).map(x=>JSON.stringify(x));
"""; 
WITH sample AS (
  SELECT "0001" AS user_id, 
  '''[{"shop_id": 1, "product_id" :1111, "value": 1},
  {"shop_id": 1, "product_id" :2222, "value": 1},
  {"shop_id": 1, "product_id" :3333, "value": 1}]''' AS purchase_ids
), parse AS (
  SELECT user_id,
    jsonparse(purchase_ids) AS pids
  FROM sample
) 
SELECT
  user_id,
  JSON_EXTRACT(pid,"$.shop_id") AS shop_id,
  JSON_EXTRACT(pid,"$.product_id") AS product_id,
  JSON_EXTRACT(pid,"$.value") AS value
FROM parse,
UNNEST(pids) pid

with result结果

Row user_id shop_id product_id  value    
1   0001    1       1111        1    
2   0001    1       2222        1    
3   0001    1       3333        1    

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 BigQuery 中使用 python UDF 或 json_extract 查询 json? - How to query json with python UDF or json_extract in BigQuery? Bigquery - json_extract 数组中的所有元素 - Bigquery - json_extract all elements from an array 如何使用 JSON_EXTRACT 或 JSON_EXTRACT_SCALAR 在 Big Query 中读取多级 JSON 数据 - how to read multiple levels of JSON data in Big Query using JSON_EXTRACT or JSON_EXTRACT_SCALAR 如何使用 BigQuery 提取 JSON 对象中的所有键 - How to extract all the keys in a JSON object with BigQuery 如何从 BigQuery 上 SQL 中的 JSON 字符串中提取嵌套值? - How do I extract nested values from a JSON string in SQL on BigQuery? BigQuery:从 json 对象数组中提取选定键的值 - BigQuery: Extract values of selected keys from an array of json objects 使用 json_extract_array 提取嵌套的 JSON:在填充的数据字段上获取 null 结果? - Extracting nested JSON using json_extract_array: getting null results on populated data fields? 如何从 JSON 数据中提取特定的键值? - How to extract specific key values from JSON data? 如何从 Flutter 的列表中提取 JSON? - How to extract JSON from a List in Flutter? 如何使用正则表达式验证 bigquery 中行程字段的格式以提取出发和到达? - How to validate with regex the format of an itinerary field in bigquery to extract the departure and arrival?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM