[英]How to write Json null value as an empty line in new file (converting json based log into column format, i.e., one file per column)
example of log file:日志文件示例:
{"timestamp": "2022-01-14T00:12:21.000", "Field1": 10, "Field_Doc": {"f1": 0}}
{"timestamp": "2022-01-18T00:15:51.000", "Field_Doc": {"f1": 0, "f2": 1.7, "f3": 2}}
It will generate 5 files:它将生成 5 个文件:
1.timestamp.column 1.timestamp.column
2.Field1.column 2.Field1.column
3.Field_Doc.f1.column 3.Field_Doc.f1.栏目
4.Field_Doc.f2.column 4.Field_Doc.f2.栏目
5.Field_Doc.f3.column 5.Field_Doc.f3.栏目
Example content of timestamp.column: timestamp.column 的示例内容:
2022-01-14T00:12:21.000
2022-01-18T00:15:51.000
I'm facing a problem while the values of keys are null, undefined as when the value us is null for example:当键的值为 null 时,我遇到了一个问题,例如当值 us 为 null 时未定义:
{"timestamp": "2022-01-14T00:12:21.000", "Field1": null, "Field_Doc": {"f1": undefined}}
can someone help me out here?有人可以帮我吗?
Note, the input file is actually an NDJSON.请注意,输入文件实际上是一个 NDJSON。 See the docs .
请参阅文档。
That being said, since furas already gave an excellent answer on how to process the NDJSON logfile I'm going to skip that part.话虽这么说,因为furas已经就如何处理 NDJSON 日志文件给出了一个很好的答案,我将跳过那部分。 Do note that there's a library to deal with NDJSON files.
请注意,有一个库可以处理 NDJSON 文件。 See PyPI .
请参阅 PyPI 。
His code needs minimal adjustment to deal with the undefined
edge case.他的代码需要最少的调整来处理
undefined
的边缘情况。 The null
value is a valid JSON value, so his code doesn't break on that. null
值是一个有效的 JSON 值,因此他的代码不会中断。
You can fix this easily by a string.replace()
while doing the json.loads()
so it becomes valid JSON, and then you can check while writing if value == None
to replace the value with an empty string.您可以在执行
json.loads()
) 时通过string.replace()
轻松修复此问题,使其变为有效 JSON,然后您可以在写入时检查是否value == None
以将值替换为空字符串。 Note that None
is the python equivalent of JSON's null
.请注意,
None
是 python 等效于 JSON 的null
。
Please note the inclusion of :
in the replace function, it's to prevent false negatives...请注意包含
:
在替换 function 中,这是为了防止漏报...
main loop logic主循环逻辑
for line in file_obj:
# the replace function makes it valid JSON
data = json.loads(line.replace(': undefined', ': null'))
print(data)
process_dict(data, write_func)
write_func() function adjustment write_func() function 调整
def write_func(key, value):
with open(key + '.column', "a") as f:
# if the value == None, make it an empty string.
if value == None:
value = ''
f.write(str(value) + "\n")
I used the following as the input string:我使用以下内容作为输入字符串:
{"timestamp": "2022-01-14T00:12:21.000", "Field1": 10, "Field_Doc": {"f1": 0}}
{"timestamp": "2022-01-18T00:15:51.000", "Field_Doc": {"f1": 0, "f2": 1.7, "f3": 2}}
{"timestamp": "2022-01-14T00:12:21.000", "Field1": null, "Field_Doc": {"f1": undefined}}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.