简体   繁体   English

将 Json 文件转换为 Pandas 数据框

[英]Converting Json file to Pandas dataframe

I have a json file which I converted to dict like below:我有一个 json 文件,我将其转换为 dict 如下所示:

{'DATA': [{'COMPANY_SCHEMA': 'ABC', 'CONFIG_TYPE': 'rtype', 'IM_ID': '44f8d1b4_437e', 'MODIFIED_DATE': 'Unknown', 'ID': 'Test', 'CONFIG_KEY': 'posting_f', 'SYSTEM_NUMBER': '50', 'SYS_CONFIG_VALUE': '0', 'SYS_CONFIG_STRING_VALUE': 'true'}

I wrote the following code to convert a json file to above dict format我写了以下代码将json文件转换为dict格式

with open('data.json') as data_file: 
    data = json.load(data_file)

Now I am trying to store this dict as pandas data frame with keys as column headers.现在我试图将这个 dict 存储为 Pandas 数据框,并将键作为列标题。

So I write below:所以我写在下面:

df=pd.DataFrame.from_dict(data,orient='columns')

But I get all columns as one column.但是我将所有列都作为一列。

df.head(3)

    DATA
0   {'COMPANY_SCHEMA': 'ABC.', 'CON...
1   {'COMPANY_SCHEMA': 'ABC', 'CON...
2   {'COMPANY_SCHEMA': 'ABC', 'CON...

I basically have a bunch of such json files in a folder and I am trying to read all of them and store in one pandas data frame appended one below the other.我基本上在一个文件夹中有一堆这样的 json 文件,我试图读取所有这些文件并将它们存储在一个 Pandas 数据框中,一个附加在另一个下面。

So I was trying above.所以我在上面尝试。 So所以

1) why the above error when converting to pandas data frame and 1)为什么在转换为pandas数据框时出现上述错误和

ii) Is there a better and faster way to read a bunch of such files and append to one json and then add it to pandas frame or one by one? ii) 是否有更好更快的方法来读取一堆此类文件并附加到一个 json 中,然后将其添加到 Pandas 框架或一个一个?

Not sure about why you are getting the error you show, but I would skip converting the json to a dictionary and just use pd.read_json() instead.不确定为什么你会收到你显示的错误,但我会跳过将 json 转换为字典而只使用pd.read_json()代替。

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_json.html https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_json.html

The data you provide is broken, so it is hard to reproduce.您提供的数据已损坏,因此很难重现。 Try to provide a reproducible case when asking!在询问时尝试提供可重现的案例! ;-) ;-)

Anyway I guess you just need to:无论如何,我想你只需要:

df = pandas.DataFrame(data['DATA'])

Where data is the dictionary you created with json.load() .其中data是您使用json.load()创建的字典。

A pandas.DataFrame() can be initialized with a list of dictionaries with no problem, but you need to pass the list of dictionaries. pandas.DataFrame()可以用字典列表初始化,没有问题,但您需要传递字典列表。

If you are concerned about performance then yeah, append to your list of dictionaries first and convert the whole list to a DataFrame with pandas.DataFrame(list_of_dictionaries) .如果您担心性能,那么是的,请先附加到您的字典列表中,然后将整个列表转换为带有pandas.DataFrame(list_of_dictionaries)

Your data is broken.您的数据已损坏。 After analyzing your question, I construct one like following:在分析了你的问题后,我构建了一个如下:

{'DATA': [{'COMPANY_SCHEMA': 'ABC', 'CONFIG_TYPE': 'rtype', 'IM_ID': '44f8d1b4_437e', 'MODIFIED_DATE': 'Unknown', 'ID': 'Test', 'CONFIG_KEY': 'posting_f', 'SYSTEM_NUMBER': '50', 'SYS_CONFIG_VALUE': '0', 'SYS_CONFIG_STRING_VALUE': 'true'}, {'COMPANY_SCHEMA': 'ABC', 'CONFIG_TYPE': 'rtype', 'IM_ID': '44f8d1b4_437e', 'MODIFIED_DATE': 'Unknown', 'ID': 'Test', 'CONFIG_KEY': 'posting_f', 'SYSTEM_NUMBER': '50', 'SYS_CONFIG_VALUE': '0', 'SYS_CONFIG_STRING_VALUE': 'true'}]}

Since you only give the converted dict and JSON specification - RFC7159 states that a string begins and ends with quotation mark which is " . I just take the dict as an example.由于您只提供转换后的 dict 和JSON 规范 - RFC7159指出字符串以引号开头和结尾,即" 。我仅以 dict 为例。

I use ast.literal_eval() to safely get a data structure from a string, which is dict same with your json.load() .我使用ast.literal_eval()从字符串中安全地获取数据结构,这与您的json.load() dict相同。 After getting a dict object, there are various ways to convert it to dataframe.得到一个dict对象后,有多种方法可以将其转换为数据帧。

import ast
import pandas as pd


with open('data.dict') as data_file:
    dict_data = ast.literal_eval(data_file.read())

# The following methods all produce the same output:
pd.DataFrame(dict_data['DATA'])
pd.DataFrame.from_dict(dict_data['DATA'])
pd.DataFrame.from_records(dict_data['DATA'])
# print(pd.DataFrame(dict_data['DATA']))
  COMPANY_SCHEMA CONFIG_TYPE          IM_ID MODIFIED_DATE    ID CONFIG_KEY SYSTEM_NUMBER SYS_CONFIG_VALUE SYS_CONFIG_STRING_VALUE
0            ABC       rtype  44f8d1b4_437e       Unknown  Test  posting_f            50                0                    true
1            ABC       rtype  44f8d1b4_437e       Unknown  Test  posting_f            50                0                    true
  1. why the above error when converting to pandas data frame为什么转换为pandas数据框时出现上述错误

If you mean why there's only one column, that's pandas.DataFrame.from_dict() treats the keys of the dict as the DataFrame columns by default.如果您的意思是为什么只有一列,那就是pandas.DataFrame.from_dict()默认情况下将 dict 的键视为 DataFrame 列。 If you do df=pd.DataFrame.from_dict(data) , whose key is DATA .如果您执行df=pd.DataFrame.from_dict(data) ,其键是DATA So there is only one column named DATA所以只有一列名为DATA

ii) Is there a better and faster way to read a bunch of such files and append to one json and then add it to pandas frame or one by one? ii) 是否有更好更快的方法来读取一堆此类文件并附加到一个 json 中,然后将其添加到 Pandas 框架或一个一个?

My solution is to concat all the dict data to one list:我的解决方案是将所有 dict 数据连接到一个列表:

with open('data1.json') as data_file: 
    dict_data1 = json.load(data_file)

....

data = dict_data1['DATA'] + dict_data2['DATA']

# Convert to pandas dataframe
pd.DataFrame(data)

# Dump the data to json file
with open('result.json', 'w') as fp:
    json.dump({'DATA': data}, fp)

You could use a for loop to simplify the procedure.您可以使用 for 循环来简化过程。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM