繁体   English   中英

如何使用 Python 从 JSON 行文件中解析特定的唯一值并将其存储到数组中

[英]How to parse for specific unique values from a JSON lines file with Python and store into an array

该程序需要解析一个 JSON 行文件并将数据存储到一个数组中。 实际需要存储在数组中的唯一数据是“SRC/Word1”之后的任何值。

这是 JSON 行文件的示例:

{"Event UTC": "2020-12-21 05:23:06", "Event Time": "00:23:06:94", "SRC/Word1": " ", "Word2": " ", "Word3": " "}
{"Event UTC": "2020-12-21 05:30:53", "Event Time": "00:30:53:95", "SRC/Word1": "E1F25701", "Word2": "A29C7E68", "Word3": " "}
{"Event UTC": "2020-12-21 05:31:04", "Event Time": "00:31:04:34", "SRC/Word1": "E1F25701", "Word2": "D529F3D7", "Word3": " "}
{"Event UTC": "2020-12-21 10:18:54", "Event Time": "05:18:54:45", "SRC/Word1": "E15511D7", "Word2": "1F6FC55C", "Word3": " "}

这是我到目前为止的代码:

import json

data = []
with open('stela_zerrl_t01_201222_084053_test.json') as fin:
    for line in fin:
        data.append(json.loads(line))
        print(data)

数据数组将包含类似 data = [E1F25701, E15511D7] 的内容

知道如何做到这一点吗?

见下文( data代表从文件加载的行)

data = [{"Event UTC": "2020-12-21 05:23:06", "Event Time": "00:23:06:94", "SRC/Word1": " ", "Word2": " ", "Word3": " "},
        {"Event UTC": "2020-12-21 05:30:53", "Event Time": "00:30:53:95", "SRC/Word1": "E1F25701", "Word2": "A29C7E68",
         "Word3": " "},
        {"Event UTC": "2020-12-21 05:31:04", "Event Time": "00:31:04:34", "SRC/Word1": "E1F25701", "Word2": "D529F3D7",
         "Word3": " "},
        {"Event UTC": "2020-12-21 10:18:54", "Event Time": "05:18:54:45", "SRC/Word1": "E15511D7", "Word2": "1F6FC55C",
         "Word3": " "}]
data_sub_set = list(set(x["SRC/Word1"] for x in data if x["SRC/Word1"].strip()))
print(data_sub_set)

输出

['E1F25701', 'E15511D7']

JSON 对象只需要像字典一样访问。 如果您正在寻找SRC/Word1字段,那么您会要求:

import json

data = []
with open('stela_zerrl_t01_201222_084053_test.json') as fin:
    for line in fin:
        data.append(json.loads(line)['SRC/Word1']) # not field access here
        print(data)

但是如果 json 并不总是具有该字段,您可能希望省略空字符串条目或进行一些错误处理。

编辑:刚刚看到您的“跳过重复项并省略空项”评论。

import json

data = []
with open('stela_zerrl_t01_201222_084053_test.json') as fin:
    for line in fin:
        value = json.loads(line).get('SRC/Word1', '')
        # check not all spaces and also not already present in array
        if not value.isspace() and value not in data:
            data.append(value)
            print(data)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM