简体   繁体   中英

Python glom with list of records group common unique client_ids together as key

I just discovered glom and the tutorial makes sense, but I can't figure out the right spec to use for chrome BrowserHistory.json entries to create a data structure grouped by client_id or if this is even the right use of glom. I think I can accomplish this using other methods by looping over the json, but was hoping to learn more about glom and its capabilities.

The json has Browser_History with a list for each history entry as follows:

{
    "Browser_History": [
        {
            "favicon_url": "https://www.google.com/favicon.ico",
            "page_transition": "LINK",
            "title": "Google Takeout",
            "url": "https://takeout.google.com",
            "client_id": "abcd1234",
            "time_usec": 1424794867875291
},
...

I'd like a data structure where everything is grouped by the client_id, like with the client_id as the key to a list of dicts, something like:

{ 'client_ids' : {
                'abcd1234' : [ {
                                 "title" : "Google Takeout",
                                 "url"   : "https://takeout.google.com",
                                 ...
                             },
                             ...
                             ],
                'wxyz9876' : [ {
                                 "title" : "Google",
                                 "url"   : "https://www.google.com",
                                 ...
                             },
                             ...
              }
}

Is this something glom is suited for? I've been playing around with it and reading, but I can't seem to get the spec correct to accomplish what I need. Best I've got without error is:

with open(history_json) as f:
    history_list = json.load(f)['Browser_History']

spec = {
    'client_ids' : ['client_id']
}
pprint(glom(data, spec))

which gets me a list of all the client_ids, but I can't figure out how to group them together as keys rather than have them as a big list. any help would be appreciated, thanks!

This should do the trick although I'm not sure if this is the most "glom"-ic way to achieve this.

import glom

grouping_key = "client_ids"

def group_combine (existing,incoming):
    # existing is a dictionary used for accumulating the data
    # incoming is each item in the list (your input)
    if incoming[grouping_key] not in existing:
        existing[incoming[grouping_key]] = []
    if grouping_key in incoming:
        existing[incoming[grouping_key]].append(incoming)
    
    return existing


data ={ 'Browser_History': [{}] } # your data structure

fold_spec = glom.Fold(glom.T,init = dict, op = group_combine )
results = glom.glom(data["Browser_History"] ,{ grouping_key:fold_spec })

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM