简体   繁体   English

在python中将复杂的json文件转换为csv

[英]complex json file to csv in python

I need to convert a complex json file to csv using python, I tried a lot of codes without success, I came here for help,I updated the question, the JSON file is about a million,I need to convert them to csv format我需要使用python将复杂的json文件转换为csv,我尝试了很多代码都没有成功,我来这里寻求帮助,我更新了问题,json文件大约一百万,我需要将它们转换为csv格式

csv file .csv 文件

{
    "_id": {
        "$oid": "2e3230"
    },
    "add": {
        "address1": {
            "address": "kvartira 14",
            "zipcode": "10005",
        },
        "name": "Evgiya Kovava",
        "address2": {
            "country": "US",
            "country_name": "NY",
        }
    }
}
{
    "_id": {
        "$oid": "2d118c8bo"
    },
    "add": {
        "address1": {
            "address": "kvartira 14",
            "zipcode": "52805",
        },
        "name": "Eiya tceva",
        "address2": {
            "country": "US",
            "country_name": "TX",
        }
    }
}

import pandas as pd

null = 'null'

data = {
    "_id": {
        "$oid": "2e3230s314i5dc07e118c8bo"
    },
    "add": {
        "address": {
            "address_type": "Door",
            "address": "kvartira 14",
            "city": "new york",
            "region": null,
            "zipcode": "10005",
        },
        "name": "Evgeniya Kovantceva",
        "type": "Private person",
        "code": null,
        "additional_phone_nums": null,
        "email": null,
        "notifications": [],
        "address": {
            "address": "kvartira 14",
            "city": "new york",
            "region": null,
            "zipcode": "10005",
            "country": "US",
            "country_name": "NY",
        }
    }
}

df = pd.json_normalize(data)
df.to_csv('yourpath.csv')

Beware the null value.注意空值。 The "address" nested dictionary comes inside "add" two times almost identical? “地址”嵌套字典在“添加”中两次几乎相同?

EDIT编辑

Ok after your information it looks like json.JSONDecoder() is what you need.好的,在您提供信息之后,您需要的是 json.JSONDecoder() 。

Originally posted by @pschill on this link: how to analyze json objects that are NOT separated by comma (preferably in Python)最初由@pschill 在此链接上发布: 如何分析不以逗号分隔的 json 对象(最好在 Python 中)

I tried his code on your data:我在您的数据上尝试了他的代码:

import json 
import pandas as pd

data = """{
    "_id": {
        "$oid": "2e3230"
    },
    "add": {
        "address1": {
            "address": "kvartira 14",
            "zipcode": "10005"
        },
        "name": "Evgiya Kovava",
        "address2": {
            "country": "US",
            "country_name": "NY"
        }
    }
}
{
    "_id": {
        "$oid": "2d118c8bo"
    },
    "add": {
        "address1": {
            "address": "kvartira 14",
            "zipcode": "52805"
        },
        "name": "Eiya tceva",
        "address2": {
            "country": "US",
            "country_name": "TX"
        }
    }
}"""

Keep in mind that your data also has trailing commas which makes the data unreadable (the last commas right before every closing bracket).请记住,您的数据也有尾随逗号,这使数据不可读(每个右括号之前的最后一个逗号)。

You have to remove them with some regex or another approach I am not familiar with.您必须使用一些正则表达式或我不熟悉的其他方法来删除它们。 For the purpose of this answer I removed them manually.出于此答案的目的,我手动删除了它们。

So after that I tried this:所以在那之后我尝试了这个:

content = data
parsed_values = []
decoder = json.JSONDecoder()
while content:
    value, new_start = decoder.raw_decode(content)
    content = content[new_start:].strip()
    # You can handle the value directly in this loop:
    # print("Parsed:", value)
    # Or you can store it in a container and use it later:
    parsed_values.append(value)

which gave me an error but the list seems to get populated with all the values:这给了我一个错误,但列表似乎填充了所有值:

parsed_values
[{'_id': {'$oid': '2e3230'},
  'add': {'address1': {'address': 'kvartira 14', 'zipcode': '10005'},
   'name': 'Evgiya Kovava',
   'address2': {'country': 'US', 'country_name': 'NY'}}},
 {'_id': {'$oid': '2d118c8bo'},
  'add': {'address1': {'address': 'kvartira 14', 'zipcode': '52805'},
   'name': 'Eiya tceva',
   'address2': {'country': 'US', 'country_name': 'TX'}}}]

next I did:接下来我做了:

df = pd.json_normalize(parsed_values)

which worked fine.效果很好。 You can always save that to a csv with:您可以随时将其保存到 csv 中:

df.to_csv('yourpath.csv')

Tell me if that helped.告诉我这是否有帮助。

Your json is quite problematic after all.毕竟你的 json 是很成问题的。 Duplicate keys (problem), null value (unreadable), trailing commas (unreadable), not comma separated dicts... It didn't catch the eye at first :P重复的键(问题),空值(不可读),尾随逗号(不可读),不是逗号分隔的字典......一开始并没有引起注意:P

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM