[英]Python glom with list of records group common unique client_ids together as key
I just discovered glom and the tutorial makes sense, but I can't figure out the right spec to use for chrome BrowserHistory.json entries to create a data structure grouped by client_id or if this is even the right use of glom.我刚刚发现了 glom,教程很有意义,但我无法弄清楚用于 chrome BrowserHistory.json 条目的正确规范来创建按 client_id 分组的数据结构,或者这是否是 glom 的正确用法。 I think I can accomplish this using other methods by looping over the json, but was hoping to learn more about glom and its capabilities.
我想我可以使用其他方法通过遍历 json 来完成此操作,但我希望了解更多有关 glom 及其功能的信息。
The json has Browser_History with a list for each history entry as follows: json 有 Browser_History,其中包含每个历史条目的列表,如下所示:
{
"Browser_History": [
{
"favicon_url": "https://www.google.com/favicon.ico",
"page_transition": "LINK",
"title": "Google Takeout",
"url": "https://takeout.google.com",
"client_id": "abcd1234",
"time_usec": 1424794867875291
},
...
I'd like a data structure where everything is grouped by the client_id, like with the client_id as the key to a list of dicts, something like:我想要一个数据结构,其中所有内容都按 client_id 分组,例如将 client_id 作为字典列表的键,例如:
{ 'client_ids' : {
'abcd1234' : [ {
"title" : "Google Takeout",
"url" : "https://takeout.google.com",
...
},
...
],
'wxyz9876' : [ {
"title" : "Google",
"url" : "https://www.google.com",
...
},
...
}
}
Is this something glom is suited for?这是 glom 适合的东西吗? I've been playing around with it and reading, but I can't seem to get the spec correct to accomplish what I need.
我一直在玩弄它并阅读,但我似乎无法获得正确的规范来完成我需要的。 Best I've got without error is:
我没有错误的最好的是:
with open(history_json) as f:
history_list = json.load(f)['Browser_History']
spec = {
'client_ids' : ['client_id']
}
pprint(glom(data, spec))
which gets me a list of all the client_ids, but I can't figure out how to group them together as keys rather than have them as a big list.这让我得到了所有 client_ids 的列表,但我无法弄清楚如何将它们组合在一起作为键而不是将它们作为一个大列表。 any help would be appreciated, thanks!
任何帮助将不胜感激,谢谢!
This should do the trick although I'm not sure if this is the most "glom"-ic way to achieve this.这应该可以解决问题,尽管我不确定这是否是实现此目标的最“glom”-ic 方法。
import glom
grouping_key = "client_ids"
def group_combine (existing,incoming):
# existing is a dictionary used for accumulating the data
# incoming is each item in the list (your input)
if incoming[grouping_key] not in existing:
existing[incoming[grouping_key]] = []
if grouping_key in incoming:
existing[incoming[grouping_key]].append(incoming)
return existing
data ={ 'Browser_History': [{}] } # your data structure
fold_spec = glom.Fold(glom.T,init = dict, op = group_combine )
results = glom.glom(data["Browser_History"] ,{ grouping_key:fold_spec })
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.