简体   繁体   English

将 JSON 转换为 Pandas Dataframe 在 Python

[英]Convert JSON to Pandas Dataframe in Python

There is data in json format as below: json格式的数据如下:

dict = {"a":1,"b":2,"c":[{dic 1},{dic2},...so on]}

where dic 1 is defined below, like this list of dictionaries are there其中 dic 1 在下面定义,就像这个字典列表一样

dic 1 = {"d":4,"e":{"f":6,"g":7},"h":{"i":9,"j":[10,11,12]},"m":13}

so, the whole json file looks like the below:因此,整个 json 文件如下所示:

dict = {"a":1,"b":2,"c":[{"d":4,"e":{"f":6,"g":7},"h":{"i":9,"j":[10,11,12]},"m":13},{dic2},...so on]}

Now I want to store this data as Pandas Dataframe like the below table, give your suggestion please现在我想将此数据存储为 Pandas Dataframe 如下表所示,请给出您的建议

Expected Output:预期 Output: 数据帧输出

The structure of your json is complex!您的 json 的结构很复杂! Make it simple!让它变得简单!

Your code won't run, returns unhashable type 'dict'.您的代码将无法运行,返回不可散列的类型“dict”。 To solve, simply unpack any variable you're using in the main 'dict' (that's **dic1).要解决此问题,只需解压缩您在主“dict”(即 **dic1)中使用的任何变量。

Even with that, you end with 2 rows and 3 columns.即使这样,您也以 2 行和 3 列结束。 Why?为什么? The data in key 'c' is a list of dicts, pandas interpret list items as data for a column.键“c”中的数据是字典列表,pandas 将列表项解释为列的数据。 Organize the json file.整理 json 文件。

Lastly, avoid using 'dict' to name a variable.最后,避免使用 'dict' 来命名变量。

Try This尝试这个

import json
import pandas as pd
from glob import glob
import matplotlib.pyplot as plt

#Convert json string to a flat python dictionary

def convert(x):
    ob = json.loads(x)
    for k, v in ob.copy().items():
        if isinstance(v, list):
            ob[k] = ','.join(v)
        elif isinstance(v, dict):
            for kk, vv in v.items():
                ob['%s_%s' % (k, kk)] = vv
            del ob[k]
    return ob

for json_filename in glob('*.json'):
    csv_filename = '%s.csv' % json_filename[:-5]
    print('Converting %s to %s' % (json_filename, csv_filename))
    df = pd.DataFrame([convert(line) for line in open(json_filename, encoding='utf-8')])
    df.to_csv(csv_filename, encoding='utf-8', index=False)

#Convert csv to pdf
data1 = pd.read_csv('data1.csv')
data2 = pd.read_csv('data2.csv')
data3 = pd.read_csv('data3csv')
data4 = pd.read_csv('data4.csv')
data5 = pd.read_csv('data5.csv')

https://gist.github.com/Elsaveram/3258db49eaac5e258401338ae17139a3 https://gist.github.com/Elsaveram/3258db49eaac5e258401338ae17139a3

Save all the dictionary in file in format.json like this将所有字典以 format.json 格式保存在文件中,如下所示

import json
with open('dict.json', 'w') as fp:
json.dump(dict, fp,sort_keys=True, indent=4)

then try this然后试试这个

      df_json = pd.read_json(r'filepath\dict.json', lines=True)

if it didn't work we will figure out a regular expression to read the complexity inside.如果它不起作用,我们将找出一个正则表达式来读取内部的复杂性。 don't forget to import pandas and json, pandas can handle all the complexity most of the time.不要忘记导入 pandas 和 json,pandas 大多数时候可以处理所有的复杂性。 This will take much shorter time than using regular expressions and converting to csv files.这将比使用正则表达式和转换为 csv 文件所需的时间短得多。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM