[英]Pandas: json multiple files and concatenate with weird format
I am having a hard time reading json files with a different structure than I am used to. 我很难读取结构与以前不同的json文件。 The content of the json files are all inside brackets: [{content}]. json文件的内容都放在方括号内:[{content}]。
This is what I normally do: 这是我通常所做的:
data_dir = 'data/filesDump'
filenames = os.listdir(data_dir)
filenames = [os.path.join(data_dir, f) for f in filenames if f.endswith('.json')]
train_df = pd.concat([pd.read_json(file, encoding='UTF-8') for file in filenames],
ignore_index = True)
I get this error: 我收到此错误:
ValueError: Expected object or value ValueError:预期的对象或值
The only thing different with the thousands json I got is that the content is in a bracket []. 我得到的数千个json唯一不同的是内容放在方括号[]中。 So I suspect this is giving json_read a problem? 所以我怀疑这给json_read一个问题? Anyone know how to load such format? 有人知道如何加载这种格式吗?
Sample (I may have made a mistake in brackets but that's just to give an idea): 示例 (我可能在括号中犯了一个错误,但这只是一个想法):
[{"id":"value","title":"value","body":"text","categories":[{"id":value,"name":"name","keys":[{"id":value,"hits":["word1","word2"]},{"id":value,"hits":["word1","word2"]}],"date":value}] [{ “ID”: “值”, “标题”: “值”, “体”: “文本”, “类别”:[{ “ID”:值, “姓名”: “姓名”, “钥匙”: [{ “ID”:值, “命中”:[ “字词1”, “单词2”]},{ “ID”:值, “命中”:[ “字词1”, “单词2”]}], “日期”:值}]
Not all JSON files can be converted to a DataFrame, a specific format is required. 并非所有JSON文件都可以转换为DataFrame,因此需要特定的格式。
You should first convert your JSON files to Python structures with the standard json module, then you can modify the structure to fit the DataFrame constructor requirements. 您应该首先使用标准的json模块将JSON文件转换为Python结构,然后可以修改结构以符合DataFrame构造函数要求。
For example, if your JSON has an extra bracket around the usual dictionary required to make a DataFrame, meaning the data is included in a list as sugested by @Atreus, you can remove it by taking only the first element of the list : 例如,如果您的JSON在制作DataFrame所需的常用字典的旁边带有一个额外的括号,这意味着该数据包含在@Atreus表示的列表中,则可以通过仅使用列表的第一个元素将其删除:
import json
struct=json.loads('[{"A":{"0":1,"1":2,"2":3},"B":{"0":4,"1":5,"2":6}}]')
print pd.DataFrame(struct[0])
outputs : 输出:
A B
0 1 4
1 2 5
2 3 6
因此,事实证明,我确实需要像manu所指的那样使用json.loads,但要注意以下几点:
json.load(open(file, encoding='utf-8-sig'))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.