简体   繁体   中英

Remove Python dict from JSON file response

I've looked over a few resources such as the following: Remove python dict item from nested json file but cannot seem to get my code to work. From what I understand of my JSON below (which is a variable placeholder for a WAY longer dump), it's a dict with a dict inside of it with a dict inside of that with....lists randomly inside of it. What I ultimately want to see is the following printout to my Terminal:

Message: [Client ID] 
Link: "http://linkgoeshere.com"

Here's what I have so far:

ThreeLine= {u'hits': {u'hits': [{u'_id': u'THIS IS THE FIRST ONE',
                  u'_index': u'foo',
                  u'_score': None,
                  u'_source': {u'@timestamp': u'2015-12-21T16:59:40.000-05:00',
                               u'message': u'Application.INFO: [Client ID ] Information Link: http://google.com {"check1":121212} {"tags":{"sent":"15","HTML":"5661"},"person":"15651"}',
                               u'system': u'user-info'}},
                {u'_id': u'THIS IS THE SECOND ONE',
                  u'_index': u'two',
                  u'_score': None,
                  u'_source': {u'@timestamp': u'2015-12-12 T16:59:40.000-05:00',
                               u'message': u'Application.INFO: [Client ID ] Information Link: http://google.com {"check1":565656} {"tags":{"sent":"21","HTML":"4512"},"person":"15651"}',
                               u'system': u'user-info'}},
]}}

unpacking= ThreeLine['hits']['hits'] #we only want to talk to the sort dictionary. 


for d in unpacking:
    newinfo= []
    narrow=[d["_source"] for d in unpacking if "_source" in d] 
    narrower=[d["message"] for d in narrow if "message" in d]
    newinfo.append(narrower)
print newinfo

Right now, with the code as it is, it'll print both entries, but it has a lot of random junk I don't care about, like all of the tags:

{"tags":{"sent":"21","HTML":"4512"},"person":"15651"}',

So, how do I further strip out those entries so I just wind up with the two lines I ultimately want out of this insanely nested mess? If anyone has ideas for how I can clean up the current code, I'm all ears and ready to learn!

The 'tags' dictionary is not a dictionary. It is text embedded in the message string:

>>> ThreeLine['hits']['hits'][0]['_source']['message']
u'Application.INFO: [Client ID ] Information Link: http://google.com {"check1":121212} {"tags":{"sent":"15","HTML":"5661"},"person":"15651"}'

You'll have to do some string parsing to remove that. You could use a regular expression:

import re
id_and_link = re.compile(r'(\[[^]]+\]) Information Link: (https?://[\w\d/.]+)')

messages = (entry['_source']['message'] for entry in ThreeLine['hits']['hits'] if '_source' in entry and 'message' in entry['_source'])
for message in messages:
    match = id_and_link.search(message)
    if not match:
        continue
    id_, link = match.groups()
    print 'Message:', id_
    print 'Link:', link
    print

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM