使用Pandas将JSON转换为具有特定字段的CSV

Question

I am currently trying to convert a JSON file to a CSV file using Pandas. 我目前正在尝试使用Pandas将JSON文件转换为CSV文件。

The codes that I'm using now are able to convert the JSON to a CSV file. 我现在使用的代码能够将JSON转换为CSV文件。

import pandas as pd
json_data = pd.read_json("out1.json")
from pandas.io.json import json_normalize
df = json_normalize(json_data["events"])
df.to_csv("out.csv)

This is my JSON file: 这是我的JSON文件：

{
  "events": [
    {
      "raw": "{\"level\": \"INFO\", \"message\": \"Disabled camera with QR scan on  by 80801234 at Area A\n\"}",
      "logtypes": [
        "json"
      ],
      "timestamp": 1537190572023,
      "unparsed": null,
      "logmsg": "{\"level\": \"INFO\", \"message\": \"Disabled camera with QR scan on  by 80801234 at Area A\n\"}",
      "id": "c77afb4c-ba7c-11e8-8000-12b233ae723a",
      "tags": [
        "INFO"
      ],
      "event": {
        "json": {
          "message": "Disabled camera with QR scan on  by 80801234 at Area A\n",
          "level": "INFO"
        },
        "http": {
          "clientHost": "116.197.237.29",
          "contentType": "text/plain; charset=UTF-8"
        }
      }
    },
    {
      "raw": "{\"level\": \"INFO\", \"message\": \"Employee number saved successfully.\"}",
      "logtypes": [
        "json"
      ],
      "timestamp": 1537190528619,
      "unparsed": null,
      "logmsg": "{\"level\": \"INFO\", \"message\": \"Employee number saved successfully.\"}",
      "id": "ad9c0175-ba7c-11e8-803d-12b233ae723a",
      "tags": [
        "INFO"
      ],
      "event": {
        "json": {
          "message": "Employee number saved successfully.",
          "level": "INFO"
        },
        "http": {
          "clientHost": "116.197.237.29",
          "contentType": "text/plain; charset=UTF-8"
        }
      }
    }
  ]
}

But what I wanted was just some fields ( timestamp , level , message ) inside the JSON file not all of it. 但我想要的只是JSON文件中的一些字段（ 时间戳 ，级别，消息）而不是全部。

I have tried a variety of ways: 我尝试过各种方法：

df = json_normalize(json_data["timestamp"]) // gives a KeyError on 'timestamp'

df = json_normalize(json_data, 'timestamp', ['event', 'json', ['level', 'message']]) // TypeError: string indices must be integers

Where did i went wrong? 我哪里出错了？

Answer 1

I don't think json_normalize is intended to work on this specific orientation. 我认为json_normalize不打算在这个特定的方向上工作。 I could be wrong but from the documentation, it appears that normalization means "Deal with lists within each dictionary". 我可能是错的但是从文档来看，似乎规范化意味着“处理每个字典中的列表”。

Assume data is 假设data是

data = json.load(open('out1.json'))['events']

Look at the first entry 看看第一个条目

data[0]['timestamp']

1537190572023

json_normalize wants this to be a list json_normalize希望这是一个列表

[{'timestamp': 1537190572023}]

Create augmented `data2` 创建增强`data2`

I don't actually recommend this approach. 我实际上并不推荐这种方法。
If we create data2 accordingly: 如果我们相应地创建data2 ：

data2 = [{**d, **{'timestamp': [{'timestamp': d['timestamp']}]}} for d in data]

We can use json_normalize 我们可以使用json_normalize

json_normalize(
    data2, 'timestamp',
    [['event', 'json', 'level'], ['event', 'json', 'message']]
)

       timestamp event.json.level                                 event.json.message
0  1537190572023             INFO  Disabled camera with QR scan on  by 80801234 a...
1  1537190528619             INFO                Employee number saved successfully.

Comprehension 理解

I think it's simpler to just do 我认为这样做更简单

pd.DataFrame([
    (d['timestamp'],
     d['event']['json']['level'],
     d['event']['json']['message'])
    for d in data
], columns=['timestamp', 'level', 'message'])

       timestamp level                                            message
0  1537190572023  INFO  Disabled camera with QR scan on  by 80801234 a...
1  1537190528619  INFO                Employee number saved successfully.

`json_normalize`

But without the fancy arguments 但没有花哨的论点

json_normalize(data).pipe(
    lambda d: d[['timestamp']].join(
        d.filter(like='event.json')
    )
)

       timestamp event.json.level                                 event.json.message
0  1537190572023             INFO  Disabled camera with QR scan on  by 80801234 a...
1  1537190528619             INFO                Employee number saved successfully.

使用Pandas将JSON转换为具有特定字段的CSV

问题描述

1 个解决方案

解决方案1
3 已采纳 2018-09-18 20:43:00

Create augmented `data2` 创建增强`data2`

Comprehension 理解

`json_normalize`

使用Pandas将JSON转换为具有特定字段的CSV

问题描述

1 个解决方案

解决方案1 3 已采纳 2018-09-18 20:43:00

Create augmented data2 创建增强data2

Comprehension 理解

json_normalize

解决方案1
3 已采纳 2018-09-18 20:43:00

Create augmented `data2` 创建增强`data2`

`json_normalize`