简体   繁体   中英

Python Loop through dictionary

I have a file that I wish to parse. It has data in the json format, but the file is not a json file. I want to loop through the file, and pull out the ID where totalReplyCount is greater than 0.

  {  "totalReplyCount": 0,
       "newLevel":{ 
           "main":{  
              "url":"http://www.someURL.com",
              "name":"Ronald Whitlock",
              "timestamp":"2016-07-26T01:22:03.000Z",
              "text":"something great"
              },
       "id":"z12wcjdxfqvhif5ee22ys5ejzva2j5zxh04"
    }
},
    {  "totalReplyCount": 4,
        "newLevel":{ 
           "main":{  
              "url":"http://www.someUR2L.com",
              "name":"other name",
              "timestamp":"2016-07-26T01:22:03.000Z",
              "text":"something else great"
             },
       "id":"kjsdbesd2wd2eedd23rf3r3r2e2dwe2edsd"
    }
},

My initial attempt was to do the following

def readCsv(filename):
    with open(filename, 'r') as csvFile:
        for row in csvFile["totalReplyCount"]:
            print row

but I get an error stating

TypeError: 'file' object has no attribute ' getitem '

I know this is just an attempt at printing and not doing what I want to do, but I am a novice at python and lost as to what I am doing wrong. What is the correct way to do this? My end result should look like this for the ids:

['insdisndiwneien23e2es', 'lsndion2ei2esdsd',....]

EDIT 1- 7/26/16

I saw that I made a mistake in my formatting when I copied the code (it was late, I was tired..). I switched it to a proper format that is more like JSON. This new edit properly matches file I am parsing. I then tried to parse it with JSON, and got the ValueError: Extra data: line 2 column 1 - line X column 1 :, where line X is the end of the line.

 def readCsv(filename):
        with open(filename, 'r') as file:
            data=json.load(file)
            pprint(data)

I also tried DictReader, and got a KeyError: 'totalReplyCount' . Is the dictionary un-ordered?

EDIT 2 -7/27/16

After taking a break, coming back to it, and thinking it over, I realized that what I have (after proper massaging of the data) is a CSV file, that contains a proper JSON object on each line. So, I have to parse the CSV file, then parse each line which is a top level, whole and complete JSON object. The code I used to try and parse this is below but all I get is the first string character, an open curly brace '{' :

def readCsv(filename):
    with open(filename, 'r') as csvfile:
        for row in csv.DictReader(csvfile):
            for item in row:
                print item[0]

I am guessing that the DictReader is converting the json object to a string, and that is why I am only getting a curly brace as opposed to the first key. If I was to do print item[0:5] I would get a mish mash of the first 4 characters in an un-ordered fashion on each line, which I assume is because the format has turned into an un-ordered list? I think I understand my problem a little bit better, but still wrapping my head around the data structures and the methods used to parse them. What am I missing?

After reading the question and all the above answers, please check if this is useful to you.

I have considered input file as simple file not as csv or json file.

Flow of code is as follow:

  • Open and read a file in reverse order.
  • Search for ID in line. Extract ID and store in temp variable.
  • Go on reading file line by line and search totalReplyCount.
  • Once you got totalReplyCount, check it if it greater than 0.
  • If yes, then store temp ID in id_list and re-initialize temp variable.
 import re tmp_id_to_store = '' id_list = [] for line in reversed(open("a.txt").readlines()): m = re.search('"id":"(\\w+)"', line.rstrip()) if m: tmp_id_to_store = m.group(1) n = re.search('{ "totalReplyCount": (\\d+),', line.rstrip()) if n: fou = n.group(1) if int(fou) > 0: id_list.append(tmp_id_to_store) tmp_id_to_store = '' print id_list 

More check points can be added.

As the error stated, Your csvFile is a file object, it is not a dict object, so you can't get an item out of it.

if your csvFile is in CSV format, you can use the csv module to read each line of the csv into a dict :

import csv
with open(filename) as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        print row['totalReplyCount']

note the DictReader method from the csv module, it will read your csv line and parse it into dict object

If your input file is JSON why not just use the JSON library to parse it and then run a for loop over that data. Then it is just a matter of iterating over the keys and extracting data.

import json
from pprint import pprint

with open('data.json') as data_file:    
    data = json.load(data_file)

pprint(data)

Parsing values from a JSON file using Python?

Look at Justin Peel's answer. It should help.

Parsing values from a JSON file in Python , this link has it all @ Parsing values from a JSON file using Python? via stackoverflow.

Here is a shell one-liner, should solve your problem, though it's not python.

egrep -o '"(?:totalReplyCount|id)":(.*?)$' filename | awk '/totalReplyCount/ {if ($2+0 > 0) {getline; print}}' | cut -d: -f2

output:

"kjsdbesd2wd2eedd23rf3r3r2e2dwe2edsd"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM