[英]Using Pandas to convert JSON to CSV with specific fields
I am currently trying to convert a JSON file to a CSV file using Pandas. 我目前正在尝试使用Pandas将JSON文件转换为CSV文件。
The codes that I'm using now are able to convert the JSON to a CSV file. 我现在使用的代码能够将JSON转换为CSV文件。
import pandas as pd
json_data = pd.read_json("out1.json")
from pandas.io.json import json_normalize
df = json_normalize(json_data["events"])
df.to_csv("out.csv)
This is my JSON file: 这是我的JSON文件:
{
"events": [
{
"raw": "{\"level\": \"INFO\", \"message\": \"Disabled camera with QR scan on by 80801234 at Area A\n\"}",
"logtypes": [
"json"
],
"timestamp": 1537190572023,
"unparsed": null,
"logmsg": "{\"level\": \"INFO\", \"message\": \"Disabled camera with QR scan on by 80801234 at Area A\n\"}",
"id": "c77afb4c-ba7c-11e8-8000-12b233ae723a",
"tags": [
"INFO"
],
"event": {
"json": {
"message": "Disabled camera with QR scan on by 80801234 at Area A\n",
"level": "INFO"
},
"http": {
"clientHost": "116.197.237.29",
"contentType": "text/plain; charset=UTF-8"
}
}
},
{
"raw": "{\"level\": \"INFO\", \"message\": \"Employee number saved successfully.\"}",
"logtypes": [
"json"
],
"timestamp": 1537190528619,
"unparsed": null,
"logmsg": "{\"level\": \"INFO\", \"message\": \"Employee number saved successfully.\"}",
"id": "ad9c0175-ba7c-11e8-803d-12b233ae723a",
"tags": [
"INFO"
],
"event": {
"json": {
"message": "Employee number saved successfully.",
"level": "INFO"
},
"http": {
"clientHost": "116.197.237.29",
"contentType": "text/plain; charset=UTF-8"
}
}
}
]
}
But what I wanted was just some fields ( timestamp , level , message ) inside the JSON file not all of it. 但我想要的只是JSON文件中的一些字段( 时间戳 , 级别 , 消息 )而不是全部。
I have tried a variety of ways: 我尝试过各种方法:
df = json_normalize(json_data["timestamp"]) // gives a KeyError on 'timestamp'
df = json_normalize(json_data, 'timestamp', ['event', 'json', ['level', 'message']]) // TypeError: string indices must be integers
Where did i went wrong? 我哪里出错了?
I don't think json_normalize
is intended to work on this specific orientation. 我认为
json_normalize
不打算在这个特定的方向上工作。 I could be wrong but from the documentation, it appears that normalization means "Deal with lists within each dictionary". 我可能是错的但是从文档来看,似乎规范化意味着“处理每个字典中的列表”。
Assume data
is 假设
data
是
data = json.load(open('out1.json'))['events']
Look at the first entry 看看第一个条目
data[0]['timestamp']
1537190572023
json_normalize
wants this to be a list json_normalize
希望这是一个列表
[{'timestamp': 1537190572023}]
data2
data2
I don't actually recommend this approach. 我实际上并不推荐这种方法。
If we create data2
accordingly: 如果我们相应地创建
data2
:
data2 = [{**d, **{'timestamp': [{'timestamp': d['timestamp']}]}} for d in data]
We can use json_normalize
我们可以使用
json_normalize
json_normalize(
data2, 'timestamp',
[['event', 'json', 'level'], ['event', 'json', 'message']]
)
timestamp event.json.level event.json.message
0 1537190572023 INFO Disabled camera with QR scan on by 80801234 a...
1 1537190528619 INFO Employee number saved successfully.
I think it's simpler to just do 我认为这样做更简单
pd.DataFrame([
(d['timestamp'],
d['event']['json']['level'],
d['event']['json']['message'])
for d in data
], columns=['timestamp', 'level', 'message'])
timestamp level message
0 1537190572023 INFO Disabled camera with QR scan on by 80801234 a...
1 1537190528619 INFO Employee number saved successfully.
json_normalize
But without the fancy arguments 但没有花哨的论点
json_normalize(data).pipe(
lambda d: d[['timestamp']].join(
d.filter(like='event.json')
)
)
timestamp event.json.level event.json.message
0 1537190572023 INFO Disabled camera with QR scan on by 80801234 a...
1 1537190528619 INFO Employee number saved successfully.
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.