简体   繁体   English

Python遍历字典

[英]Python Loop through dictionary

I have a file that I wish to parse. 我有一个要解析的文件。 It has data in the json format, but the file is not a json file. 它具有json格式的数据,但该文件不是json文件。 I want to loop through the file, and pull out the ID where totalReplyCount is greater than 0. 我想遍历文件,并提取出totalReplyCount大于0的ID。

  {  "totalReplyCount": 0,
       "newLevel":{ 
           "main":{  
              "url":"http://www.someURL.com",
              "name":"Ronald Whitlock",
              "timestamp":"2016-07-26T01:22:03.000Z",
              "text":"something great"
              },
       "id":"z12wcjdxfqvhif5ee22ys5ejzva2j5zxh04"
    }
},
    {  "totalReplyCount": 4,
        "newLevel":{ 
           "main":{  
              "url":"http://www.someUR2L.com",
              "name":"other name",
              "timestamp":"2016-07-26T01:22:03.000Z",
              "text":"something else great"
             },
       "id":"kjsdbesd2wd2eedd23rf3r3r2e2dwe2edsd"
    }
},

My initial attempt was to do the following 我最初的尝试是执行以下操作

def readCsv(filename):
    with open(filename, 'r') as csvFile:
        for row in csvFile["totalReplyCount"]:
            print row

but I get an error stating 但我得到一个错误说明

TypeError: 'file' object has no attribute ' getitem ' TypeError:“文件”对象没有属性“ getitem

I know this is just an attempt at printing and not doing what I want to do, but I am a novice at python and lost as to what I am doing wrong. 我知道这只是尝试打印而不是做我想做的事,但是我是python的新手,对我做错了却迷失了。 What is the correct way to do this? 正确的方法是什么? My end result should look like this for the ids: 对于ID,我的最终结果应如下所示:

['insdisndiwneien23e2es', 'lsndion2ei2esdsd',....]

EDIT 1- 7/26/16 编辑1- 7/26/16

I saw that I made a mistake in my formatting when I copied the code (it was late, I was tired..). 当我复制代码时,我发现格式化时出错(太晚了,我很累..)。 I switched it to a proper format that is more like JSON. 我将其切换为更像JSON的适当格式。 This new edit properly matches file I am parsing. 此新编辑与我正在解析的文件正确匹配。 I then tried to parse it with JSON, and got the ValueError: Extra data: line 2 column 1 - line X column 1 :, where line X is the end of the line. 然后,我尝试使用JSON解析它,并得到ValueError: Extra data: line 2 column 1 - line X column 1 :,其中X行是该行的结尾。

 def readCsv(filename):
        with open(filename, 'r') as file:
            data=json.load(file)
            pprint(data)

I also tried DictReader, and got a KeyError: 'totalReplyCount' . 我还尝试了DictReader,并得到了KeyError: 'totalReplyCount' Is the dictionary un-ordered? 字典是无序的吗?

EDIT 2 -7/27/16 编辑2 -7/27/16

After taking a break, coming back to it, and thinking it over, I realized that what I have (after proper massaging of the data) is a CSV file, that contains a proper JSON object on each line. 休息片刻后,重新考虑一下,我意识到,我拥有的(在正确处理数据之后)是一个CSV文件,该文件在每行上都包含一个正确的JSON对象。 So, I have to parse the CSV file, then parse each line which is a top level, whole and complete JSON object. 因此,我必须解析CSV文件,然后解析作为顶级,完整和完整JSON对象的每一行。 The code I used to try and parse this is below but all I get is the first string character, an open curly brace '{' : 下面是我尝试解析的代码,但我得到的只是第一个字符串字符,即大括号'{':

def readCsv(filename):
    with open(filename, 'r') as csvfile:
        for row in csv.DictReader(csvfile):
            for item in row:
                print item[0]

I am guessing that the DictReader is converting the json object to a string, and that is why I am only getting a curly brace as opposed to the first key. 我猜想DictReader会将json对象转换为字符串,这就是为什么我只得到大括号而不是第一个键的原因。 If I was to do print item[0:5] I would get a mish mash of the first 4 characters in an un-ordered fashion on each line, which I assume is because the format has turned into an un-ordered list? 如果我要print item[0:5]我会在每行上以无序的方式得到前4个字符的混搭,我认为这是因为格式已变成无序的列表吗? I think I understand my problem a little bit better, but still wrapping my head around the data structures and the methods used to parse them. 我想我对问题的理解要好一些,但是仍然将我的头放在数据结构和解析它们的方法上。 What am I missing? 我想念什么?

After reading the question and all the above answers, please check if this is useful to you. 阅读问题和以上所有答案后,请检查是否对您有用。

I have considered input file as simple file not as csv or json file. 我已经将输入文件视为简单文件,而不是csv或json文件。

Flow of code is as follow: 代码流程如下:

  • Open and read a file in reverse order. 以相反的顺序打开和读取文件。
  • Search for ID in line. 在线搜索ID。 Extract ID and store in temp variable. 提取ID并存储在temp变量中。
  • Go on reading file line by line and search totalReplyCount. 继续逐行读取文件并搜索totalReplyCount。
  • Once you got totalReplyCount, check it if it greater than 0. 一旦获得totalReplyCount,请检查它是否大于0。
  • If yes, then store temp ID in id_list and re-initialize temp variable. 如果是,则将临时ID存储在id_list中,然后重新初始化临时变量。
 import re tmp_id_to_store = '' id_list = [] for line in reversed(open("a.txt").readlines()): m = re.search('"id":"(\\w+)"', line.rstrip()) if m: tmp_id_to_store = m.group(1) n = re.search('{ "totalReplyCount": (\\d+),', line.rstrip()) if n: fou = n.group(1) if int(fou) > 0: id_list.append(tmp_id_to_store) tmp_id_to_store = '' print id_list 

More check points can be added. 可以添加更多检查点。

As the error stated, Your csvFile is a file object, it is not a dict object, so you can't get an item out of it. 如错误所述,您的csvFile是一个file对象,它不是dict对象,因此您无法从中获取任何内容。

if your csvFile is in CSV format, you can use the csv module to read each line of the csv into a dict : 如果您的csvFile为CSV格式,则可以使用csv模块将csv的每一行读入dict:

import csv
with open(filename) as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        print row['totalReplyCount']

note the DictReader method from the csv module, it will read your csv line and parse it into dict object 注意csv模块中的DictReader方法,它将读取您的csv行并将其解析为dict对象

If your input file is JSON why not just use the JSON library to parse it and then run a for loop over that data. 如果您的输入文件是JSON,那么为什么不使用JSON库解析它,然后对该数据运行for循环。 Then it is just a matter of iterating over the keys and extracting data. 然后,只需要遍历键并提取数据即可。

import json
from pprint import pprint

with open('data.json') as data_file:    
    data = json.load(data_file)

pprint(data)

Parsing values from a JSON file using Python? 使用Python从JSON文件解析值?

Look at Justin Peel's answer. 看贾斯汀·皮尔的答案。 It should help. 应该会有所帮助。

Parsing values from a JSON file in Python , this link has it all @ Parsing values from a JSON file using Python? 使用Python解析JSON文件中的值 ,此链接是否全部@ 使用Python解析JSON文件中的值? via stackoverflow. 通过stackoverflow。

Here is a shell one-liner, should solve your problem, though it's not python. 这是一个单线外壳,应该可以解决您的问题,尽管它不是python。

egrep -o '"(?:totalReplyCount|id)":(.*?)$' filename | awk '/totalReplyCount/ {if ($2+0 > 0) {getline; print}}' | cut -d: -f2

output: 输出:

"kjsdbesd2wd2eedd23rf3r3r2e2dwe2edsd"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM