[英]python problems reading correctly a nested JSON file
I'm having trouble reading correctly a nested JSON file into a dataframe.我无法将嵌套的 JSON 文件正确读取到数据框中。 This a sample of the json file with pharmaceutical products I'm working on:
这是我正在研究的药品的 json 文件示例:
[
[
{
"ScrapingOriginIdentifier": "N",
"ActiveSubstances": [
"A.C.T.H. pour préparations homéopathiques"
],
"ATC": null,
"Name": "A.C.T.H. BOIRON, degré de dilution compris entre 4CH et 30CH ou entre 8DH et 60DH",
"OtherFields": [
{
"Name": null,
"Value": "CIS: 6 499 638 6",
"Type": "string"
},
{
"Name": null,
"Value": "MA Holder since: 06/10/2021",
"Type": "string"
}
],
"Package": "1 tube de 4 g de granules",
"PharmaceuticalForm": "Granules",
},
{
"ScrapingOriginIdentifier": "N",
"ActiveSubstances": [
"A.C.T.H. pour préparations homéopathiques"
],
"ATC": null,
"Name": "A.C.T.H. BOIRON, degré de dilution compris entre 4CH et 30CH ou entre 8DH et 60DH",
"OtherFields": [
{
"Name": null,
"Value": "CIS: 6 499 638 6",
"Type": "string"
},
{
"Name": null,
"Value": "MA Holder since: 06/10/2021",
"Type": "string"
}
],
"Package": "1 tube de 20 g de pommade",
"PharmaceuticalForm": "Granules",
}
],
[
{
"ScrapingOriginIdentifier": "34009 341 687 6 5",
"ActiveSubstances": [],
"ATC": null,
"Name": "17 B ESTRADIOL BESINS-ISCOVESCO 0,06 POUR CENT, gel pour application cutanée en tube",
"OtherFields": [
{
"Name": null,
"Value": "CIS: 6 858 620 3",
"Type": "string"
},
{
"Name": null,
"Value": "Codes: 34009 341 687 6 5 or 341 687-6",
"Type": "string"
}
],
"Package": "1 tube(s) aluminium verni de 80 g avec applicateur polystyrène",
"PharmaceuticalForm": "Gel",
}
]
]
I can see the problem is that it's nested by ScrapingOriginIdentifier
.我可以看到问题在于它是由
ScrapingOriginIdentifier
嵌套的。 I read the file using:我使用以下方法读取文件:
dataset = pd.read_json('data.json', orient='records')
And tried to 'shape' it correctly using:并尝试使用以下方法正确“塑造”它:
dataset = pd.json_normalize(dataset)
This still did not work.这仍然没有奏效。 How can I read the file correctly in order to get all?
如何正确读取文件以获取所有内容?
At first, it contains unquoted values null , which should be "null" .起初,它包含未引用的值null ,它应该是"null" 。 Then, the structure of your json is not suitable for creating a dataframe.
然后,您的 json 结构不适合创建数据框。 The structure is the following:
结构如下:
[
[
{ "ScrapingOriginIdentifier": "...", ...},
{ "ScrapingOriginIdentifier": "...", ...},
],
[
{ "ScrapingOriginIdentifier": "...", ...},
]
]
While it should be constructed like this:虽然它应该这样构造:
[
[
{ "ScrapingOriginIdentifier": "...", ...},
],
[
{ "ScrapingOriginIdentifier": "...", ...},
],
[
{ "ScrapingOriginIdentifier": "...", ...},
]
]
Please consider restructuring your list this way:请考虑以这种方式重组您的列表:
json = your_json
new_list = []
for list in json:
for item in list:
new_list.append(item)
df = pd.DataFrame.from_dict(new_list)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.