使用python遍历json文件以获得特定的属性值

Question

我有一个json文件，如下所示：

[
{
    "contributors": null,
    "coordinates": null,
    "created_at": "Fri Aug 04 21:12:59 +0000 2017",
    "entities": {
        "hashtags": [
            {
                "indices": [
                    32,
                    39
                ],
                "text": "\ubd80\uc0b0\ucd9c\uc7a5\uc548\ub9c8"
            },
            {
                "indices": [
                    40,
                    48
                ],
                "text": "\ubd80\uc0b0\ucd9c\uc7a5\ub9c8\uc0ac\uc9c0"
            }
        ]
    },
    "text": "\uaedb"
    "retweeted_status": {
        "contributors": null,
        "coordinates": null,
        "created_at": "Fri Aug 04 20:30:06 +0000 2017",
        "display_text_range": [
            0,
            0
        ],
        "text": "hjhfbsdjsdbjsd"
    },
    "extended_tweet": {
            "display_text_range": [
                0,
                137
            ],
            "entities": {
                "hashtags": [
                    {
                        "indices": [
                            62,
                            75
                        ],
                        "text": "2ndAmendment"
                    },
                    {
                        "indices": [
                            91,
                            104
                        ],
                        "text": "1stAmendment"
                    }
                ]
            }
    }
}
]

我写了下面的python代码来计算整个json文件中文text属性的数量。

data = json.load(data_file)
for key, value in data1.items():
    if key=="text":
        cnt+=1
    elif key=="retweeted_status":
        for k,v in value.items():
            if k=="text":
                cnt+=1  
    elif key == "entities":
        if key.keys()=="hashtags" :
            for k1,v1 in key:
# Difficult to loop further

由于数据结构不能保持恒定，因此很难进行迭代。 此外，我想访问text属性的值并显示它。 有没有更简单的方法而无需多个循环呢？

Answer 1

使用正则表达式呢？

import re
regex_chain = re.compile(r'(text)\": \"(.*)\"')

text_ocurrences=[]
with open('1.json') as file:
    for line in file:
        match = regex_chain.search(line)
        if match:
            text_ocurrences.append({ match.group(1) : match.group(2)})
print(text_ocurrences)

您将获得一列字典，其中每个字典都包含键，文本出现的值

[{'text': '\\ubd80\\uc0b0\\ucd9c\\uc7a5\\uc548\\ub9c8'}, {'text': '\\ubd80\\uc0b0\\ucd9c\\uc7a5\\ub9c8\\uc0ac\\uc9c0'}, {'text': '\\uaedb'}, {'text': 'hjhfbsdjsdbjsd'}, {'text': '2ndAmendment'}, {'text': '1stAmendment'}]

Answer 2

我不确定天真地用正则表达式解析JSON的安全性如何，尤其是(text)\\": \\"(.*)\\"可以从技术上匹配text": "abc", "text": "another"其中第1组为text ，第2组为abc", "text": "another 。

用python的标准json库解析JSON，然后递归遍历该数据，要安全得多。

import json

def count_key(selected_key, obj):

    count = 0

    if isinstance(obj, list):
        for item in obj:
            count += count_key(selected_key, item)

    elif isinstance(obj, dict):
        for key in obj:

            if key == selected_key:
                count += 1

            count += count_key(selected_key, obj[key])

    return count


with open("my-json-file", "r") as json_file:
    print(count_key("text", json.loads(json_file.read())))

使用python遍历json文件以获得特定的属性值

问题描述

2 个解决方案

解决方案1
1 已采纳 2017-08-08 10:37:14

解决方案2
0 2017-08-08 22:36:04

使用python遍历json文件以获得特定的属性值

问题描述

2 个解决方案

解决方案1 1 已采纳 2017-08-08 10:37:14

解决方案2 0 2017-08-08 22:36:04

解决方案1
1 已采纳 2017-08-08 10:37:14

解决方案2
0 2017-08-08 22:36:04