简体   繁体   English

查找嵌套JSON中具有“ true”值的所有记录

[英]Finding all records which have a 'true' value in nested JSON

This is my nested JSON: 这是我的嵌套JSON:

{  
   "business_id":"pNQwnY_q4okdlnPiR-3RBA",
   "full_address":"6105 S Fort Apache Rd\nSpring Valley\nLas Vegas, NV 89148",
   "hours":{  },
   "open":true,
   "categories":[  ],
   "city":"Las Vegas",
   "review_count":68,
   "name":"Empire Bagels",
   "neighborhoods":[  
      "Spring Valley"
   ],
   "longitude":-115.298175926911,
   "state":"NV",
   "stars":3.0,
   "latitude":36.07728616051,
   "attributes":{  
      "Take-out":true,
      "Wi-Fi":"no",
      "Good For":{  
         "dessert":false,
         "latenight":false,
         "lunch":false,
         "dinner":false,
         "breakfast":true,
         "brunch":false
      },
      "Caters":true,
      "Noise Level":"quiet",
      "Takes Reservations":false,
      "Delivery":false,
      "Ambience":{  
         "romantic":false,
         "intimate":false,
         "classy":false,
         "hipster":false,
         "divey":false,
         "touristy":false,
         "trendy":false,
         "upscale":false,
         "casual":true
      },
      "Parking":{  
         "garage":false,
         "street":false,
         "validated":false,
         "lot":true,
         "valet":false
      },
      "Has TV":true,
      "Outdoor Seating":true,
      "Attire":"casual",
      "Alcohol":"none",
      "Waiter Service":false,
      "Accepts Credit Cards":true,
      "Good for Kids":true,
      "Good For Groups":true,
      "Price Range":1
   },
   "type":"business"
}

I am querying this Using apache drill. 我正在查询“使用apache演练”。 I want to find out the top 10 most common 'true' attributes for all restaurants in a city.I want it something like: 我想找出城市中所有餐厅的十大最常见``真实''属性,例如:

Accepts Credit Cards : 200,
Alcohol: 300,
Good For Kids : 500

How will my query look like? 我的查询看起来如何? This is what I did: 这是我所做的:

select attributes, count(*) attributes from `yelp_dataset` group by attributes;

I get this error: 我收到此错误:

Error: SYSTEM ERROR: UnsupportedOperationException: Map, Array, Union or repeated scalar type should not be used in group by, order by or in a comparison operator. Drill does not support compare between MAP:REQUIRED and MAP:REQUIRED.

Fragment 0:0

[Error Id: 8fe8a616-92c7-4da0-ab65-b5542d391f47 on 192.168.10.104:31010] (state=,code=0)

What should my query be? 我的查询应该是什么?

I was not able to auto-flatten the attributes using KVGEN() because of mixed data types, but you might try a CTE with a bit of UNION ALL brute force: 由于数据类型混合,我无法使用KVGEN()自动展平属性,但是您可以尝试使用UNION ALL蛮力进行CTE:

WITH ReviewAttributes AS (
    SELECT
      reviews.name,
      'Accepts Credit Cards' as `AttributeName`,
      CASE WHEN reviews.attributes.`Accepts Credit Cards` = true THEN 1 ELSE 0 END as `AttributeValue`
    FROM 
      `yelp_dataset` reviews
    UNION ALL
    SELECT
      reviews.name,
      'Alcohol' as `AttributeName`,
      CASE WHEN reviews.attributes.`Alcohol` <> 'none' THEN 1 ELSE 0 END as `AttributeValue`
    FROM
      `yelp_dataset` reviews
    UNION ALL
    SELECT
      reviews.name,
      'Good for Kids' as `AttributeName`,
      CASE WHEN reviews.attributes.`Good for Kids` = true THEN 1 ELSE 0 END as `AttributeValue`
    FROM
      `yelp_dataset` reviews
)
SELECT 
  `AttributeName`,
  SUM(`AttributeValue`) as `AttributeCount`
FROM
  ReviewAttributes
GROUP BY
  `AttributeName`;

The CASE statements might also help you work around some of the differences between boolean and enumerated fields, like counting Alcohol from your sample. CASE语句还可以帮助您解决布尔字段和枚举字段之间的某些差异,例如从样品中计算Alcohol

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM