简体   繁体   English

熊猫:json多个文件,并以奇怪的格式连接

[英]Pandas: json multiple files and concatenate with weird format

I am having a hard time reading json files with a different structure than I am used to. 我很难读取结构与以前不同的json文件。 The content of the json files are all inside brackets: [{content}]. json文件的内容都放在方括号内:[{content}]。

This is what I normally do: 这是我通常所做的:

data_dir = 'data/filesDump'
filenames = os.listdir(data_dir)
filenames = [os.path.join(data_dir, f) for f in filenames if f.endswith('.json')]

train_df = pd.concat([pd.read_json(file, encoding='UTF-8') for file in filenames], 
           ignore_index = True)

I get this error: 我收到此错误:

ValueError: Expected object or value ValueError:预期的对象或值

The only thing different with the thousands json I got is that the content is in a bracket []. 我得到的数千个json唯一不同的是内容放在方括号[]中。 So I suspect this is giving json_read a problem? 所以我怀疑这给json_read一个问题? Anyone know how to load such format? 有人知道如何加载这种格式吗?

Sample (I may have made a mistake in brackets but that's just to give an idea): 示例 (我可能在括号中犯了一个错误,但这只是一个想法):

[{"id":"value","title":"value","body":"text","categories":[{"id":value,"name":"name","keys":[{"id":value,"hits":["word1","word2"]},{"id":value,"hits":["word1","word2"]}],"date":value}] [{ “ID”: “值”, “标题”: “值”, “体”: “文本”, “类别”:[{ “ID”:值, “姓名”: “姓名”, “钥匙”: [{ “ID”:值, “命中”:[ “字词1”, “单词2”]},{ “ID”:值, “命中”:[ “字词1”, “单词2”]}], “日期”:值}]

Not all JSON files can be converted to a DataFrame, a specific format is required. 并非所有JSON文件都可以转换为DataFrame,因此需要特定的格式。

You should first convert your JSON files to Python structures with the standard json module, then you can modify the structure to fit the DataFrame constructor requirements. 您应该首先使用标准的json模块将JSON文件转换为Python结构,然后可以修改结构以符合DataFrame构造函数要求。

For example, if your JSON has an extra bracket around the usual dictionary required to make a DataFrame, meaning the data is included in a list as sugested by @Atreus, you can remove it by taking only the first element of the list : 例如,如果您的JSON在制作DataFrame所需的常用字典的旁边带有一个额外的括号,这意味着该数据包含在@Atreus表示的列表中,则可以通过仅使用列表的第一个元素将其删除:

import json
struct=json.loads('[{"A":{"0":1,"1":2,"2":3},"B":{"0":4,"1":5,"2":6}}]')
print pd.DataFrame(struct[0])

outputs : 输出:

   A  B
0  1  4
1  2  5
2  3  6

因此,事实证明,我确实需要像manu所指的那样使用json.loads,但要注意以下几点:

json.load(open(file, encoding='utf-8-sig'))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将CSV文件与熊猫连接 - Concatenate CSV files with pandas 如何将多个文件提供给 Pandas 以过滤数据并连接所有结果 - How to feed multiple files to pandas to filter data and concatenate all the results 将多个 excel 文件导入 python pandas 并拼接成一个 Z6A8064B5DF4794555500553C4DC7 - Import multiple excel files into python pandas and concatenate them into one dataframe 将多个CSV文件导入pandas并拼接成一个DataFrame - Import multiple CSV files into pandas and concatenate into one DataFrame 不完整 将多个 csv 文件导入 pandas 并拼接成一个 DataFrame - Not full Import multiple csv files into pandas and concatenate into one DataFrame 无法将多个 csv 文件导入到 Pandas 中并在 Python 中连接为一个 DataFrame - Failed to import multiple csv files into pandas and concatenate into one DataFrame in Python 如何使用 pandas 导入多个 csv 文件并连接成一个 DataFrame - How to import multiple csv files and concatenate into one DataFrame using pandas 按创建日期过滤多个 csv 文件并连接成一个 pandas DataFrame - Filtering multiple csv files by creation date and concatenate into one pandas DataFrame 使用 Python 中的格式导入不同文件夹中的多个文件并将它们连接起来 - Import multiple files in different folders and concatenate them, using format in Python 在文件循环中连接pandas数据帧 - concatenate pandas dataframe in a loop of files
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM