简体   繁体   English

如何在新文件中将Json null值写入空行(将基于json的日志转换为列格式,即一列一个文件)

[英]How to write Json null value as an empty line in new file (converting json based log into column format, i.e., one file per column)

example of log file:日志文件示例:

{"timestamp": "2022-01-14T00:12:21.000", "Field1": 10, "Field_Doc": {"f1": 0}}
{"timestamp": "2022-01-18T00:15:51.000", "Field_Doc": {"f1": 0, "f2": 1.7, "f3": 2}}

It will generate 5 files:它将生成 5 个文件:

1.timestamp.column 1.timestamp.column

2.Field1.column 2.Field1.column

3.Field_Doc.f1.column 3.Field_Doc.f1.栏目

4.Field_Doc.f2.column 4.Field_Doc.f2.栏目

5.Field_Doc.f3.column 5.Field_Doc.f3.栏目

Example content of timestamp.column: timestamp.column 的示例内容:

2022-01-14T00:12:21.000
2022-01-18T00:15:51.000

I'm facing a problem while the values of keys are null, undefined as when the value us is null for example:当键的值为 null 时,我遇到了一个问题,例如当值 us 为 null 时未定义:

{"timestamp": "2022-01-14T00:12:21.000", "Field1": null, "Field_Doc": {"f1": undefined}}

can someone help me out here?有人可以帮我吗?

Note, the input file is actually an NDJSON.请注意,输入文件实际上是一个 NDJSON。 See the docs .请参阅文档

That being said, since furas already gave an excellent answer on how to process the NDJSON logfile I'm going to skip that part.话虽这么说,因为furas已经就如何处理 NDJSON 日志文件给出了一个很好的答案,我将跳过那部分。 Do note that there's a library to deal with NDJSON files.请注意,有一个库可以处理 NDJSON 文件。 See PyPI .请参阅 PyPI

His code needs minimal adjustment to deal with the undefined edge case.他的代码需要最少的调整来处理undefined的边缘情况。 The null value is a valid JSON value, so his code doesn't break on that. null值是一个有效的 JSON 值,因此他的代码不会中断。

You can fix this easily by a string.replace() while doing the json.loads() so it becomes valid JSON, and then you can check while writing if value == None to replace the value with an empty string.您可以在执行json.loads() ) 时通过string.replace()轻松修复此问题,使其变为有效 JSON,然后您可以在写入时检查是否value == None以将值替换为空字符串。 Note that None is the python equivalent of JSON's null .请注意, None是 python 等效于 JSON 的null

Please note the inclusion of : in the replace function, it's to prevent false negatives...请注意包含:在替换 function 中,这是为了防止漏报...

main loop logic主循环逻辑

for line in file_obj:
    # the replace function makes it valid JSON
    data = json.loads(line.replace(': undefined', ': null'))
    print(data)
    process_dict(data, write_func)

write_func() function adjustment write_func() function 调整

def write_func(key, value):
    with open(key + '.column', "a") as f:
        # if the value == None, make it an empty string.
        if value == None:
            value = ''
        f.write(str(value) + "\n")

I used the following as the input string:我使用以下内容作为输入字符串:

{"timestamp": "2022-01-14T00:12:21.000", "Field1": 10, "Field_Doc": {"f1": 0}}
{"timestamp": "2022-01-18T00:15:51.000", "Field_Doc": {"f1": 0, "f2": 1.7, "f3": 2}}
{"timestamp": "2022-01-14T00:12:21.000", "Field1": null, "Field_Doc": {"f1": undefined}}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM