[英]python nested json to csv/xlsx with specified headers
With json like below, which is an array of objects at the outer most level with further nested arrays with objects.使用 json 如下所示,它是最外层的对象数组,其中进一步嵌套了 arrays 对象。
data = [{"a": [{"a1": [{"id0": [{"aa": [{"aaa": 97}, {"aab": "one"}], "ab": [{"aba": 97}, {"abb": ["one", "two"]}]}]}, {"id1": [{"aa": [{"aaa": 23}]}]}]}, {"a2": []}]}, {"b": [{"b1": [{"Common": [{"bb": [{"value": 4}]}]}]}]}]
I need to write this to a csv (or.xlsx file)我需要将此写入 csv (或.xlsx 文件)
what I've tried so far?到目前为止我尝试了什么?
data_file = open('data_file.csv', 'w')
csv_writer = csv.writer(data_file)
for row in data:
csv_writer.writerow(row)
data_file.close()
This gives an empty file 'data_file.csv'.这给出了一个空文件“data_file.csv”。
Also how do I add headers to the CSV.另外,如何将标题添加到 CSV。 I have the headers stored in a list as below我将标题存储在如下列表中
hdrs = ['Section', 'Subsection', 'pId', 'Group', 'Parameter', 'Value']
- this corresponds to the five levels of keys - 这对应于五个级别的键
Expected CSV output预期 CSV output
+---------+------------+--------+-------+-----------+----------+
| Section | Subsection | pId | Group | Parameter | Value |
+---------+------------+--------+-------+-----------+----------+
| a | a1 | id0 | aa | aaa | 97 |
| a | a1 | id0 | aa | aab | one |
| a | a1 | id0 | ab | aba | 97 |
| a | a1 | id0 | ab | abb | one, two |
| a | a1 | id1 | aa | aaa | 23 |
| a | a2 | | | | |
| b | b1 | Common | bb | value | 4 |
+---------+------------+--------+-------+-----------+----------+
Following code is able to parse the provided data as per expected format.以下代码能够按照预期格式解析提供的数据。
from typing import List
def parse_recursive(dat)->List[List]:
ret=[]
if type(dat) is list:
for item in dat:
if type(item)==dict:
for k in item:
#print(k, item[k], sep=" # ")#debug print
if item[k]==[]: #empty list
ret.append([k])
else:
for l in parse_recursive(item[k]):
#print(k,l,sep=" : ") #debug print
ret.append([k]+l) #always returns List of List
else: #Right now only possibility is string eg. "one", "two"
return [[",".join(dat)]]
else: #can be int or string eg. 97, "23"
return [[dat]]
return ret
def write_to_csv(file_name:str, fields:List, row_data:List[List]):
import csv
with open(file_name, 'w') as csvfile:
# creating a csv writer object
csvwriter = csv.writer(csvfile)
# writing the fields
csvwriter.writerow(fields)
# writing the data rows
csvwriter.writerows(row_data)
if __name__=="__main__":
org_data = [{"a": [
{"a1": [
{"id0": [
{
"aa": [
{"aaa": 97},
{"aab": "one"}],
"ab": [
{"aba": 97},
{"abb": ["one", "two"]}
]
}
]
},
{"id1": [
{"aa": [
{"aaa": 23}]}]}
]
},
{"a2": []}
]},
{"b": [{"b1": [{"Common": [{"bb": [{"value": 4}]}]}]}]}]
print(parse_recursive(org_data)) #Debug
file_name="data_file.csv"
fields=['Section', 'Subsection', 'pId', 'Group', 'Parameter', 'Value']
write_to_csv(file_name, fields, parse_recursive(org_data))
parse_recursive
tries to parse arbitrary depth dictionary as per rule i tried deducing from your input and output formats. parse_recursive
尝试按照我尝试从您的输入和 output 格式推断的规则解析任意深度字典。
Following is the output of parse_recursive
for your provided input -以下是您提供的输入的parse_recursive
的 output -
mahorir@mahorir-Vostro-3446:~/Desktop$ python3 so.py
[['a', 'a1', 'id0', 'aa', 'aaa', 97], ['a', 'a1', 'id0', 'aa', 'aab', 'one'], ['a', 'a1', 'id0', 'ab', 'aba', 97], ['a', 'a1', 'id0', 'ab', 'abb', 'one,two'], ['a', 'a1', 'id1', 'aa', 'aaa', 23], ['a', 'a2'], ['b', 'b1', 'Common', 'bb', 'value', 4]]
write_to_csv
is a trivial function that write to a csv file. write_to_csv
是一个普通的 function 写入 csv 文件。
This was kind-of a fun problem... There are really two problems with the formatting here:这是一个有趣的问题......这里的格式确实有两个问题:
The data is lists of dicts, where they really just wanted dictionaries.数据是字典列表,他们真的只是想要字典。 eg they wanted {"foo": 1, "bar": 2}
but instead formatted it as [{"foo": 1}, {"bar": 2}]
.例如,他们想要{"foo": 1, "bar": 2}
而是将其格式化为[{"foo": 1}, {"bar": 2}]
。
a.一个。 I'm not judging here.我不在这里评判。 There may be reasons why they did this.他们这样做可能是有原因的。 It just makes it a bit annoying for us to parse.它只是让我们解析有点烦人。
The data is sometimes truncated;数据有时会被截断; if there are usually 5 levels deep, sometimes if they don't have data beyond a point, they just omit it.如果通常有 5 个级别的深度,有时如果他们没有超过一个点的数据,他们就会忽略它。 eg 'a2'
in your example.例如,在您的示例中为'a2'
。
So I'll show two possible approaches to solving these problems.因此,我将展示解决这些问题的两种可能方法。
This solution is a bit different from the other one mentioned here.此解决方案与此处提到的其他解决方案略有不同。 Let me know what you think:让我知道你的想法:
import pandas as pd
from copy import deepcopy
hdrs = ['Section', 'Subsection', 'pId', 'Group', 'Parameter', 'Value']
js = [{"a": [{"a1": [{"id0": [{"aa": [{"aaa": 97}, {"aab": "one"}],
"ab": [{"aba": 98}, {"abb": ["one", "two"]}]}]},
{"id1": [{"aa": [{"aaa": 23}]}]}
]},
{"a2": []}
]},
{"b": [{"b1": [{"Common": [{"bb": [{"value": 4}]}]}]}]}]
def list_to_dict(lst):
"""convert a list of dicts as you have to a single dict
The idea here is that you have a bunch of structures that look
like [{x: ...}, {y: ...}] that should probably have been stored as
{x:..., y:...}. So this function does that (but just one level in).
Note:
If there is a duplicate key in one of your dicts (meaning you have
something like [{x:...},...,{x:...}]), then this function will overwrite
it without warning!
"""
d = {}
for new_d in lst:
d.update(new_d)
return d
def recursive_parse(lst, levels):
"Parse the nested json into a single pandas dataframe"
name = levels.pop(0) # I should have used a counter instead
d = list_to_dict(lst) # get a sensible dict instead of the list of dicts
if len(levels) <= 1: # meaning there are no more levels to be parsed.
if len(d) == 0:
d = {'': ''} # to handle the uneven depths (e.g. think 'a2')
return pd.Series(d, name=levels[-1])
if len(d) == 0: # again to handle the uneven depths of json
d = {'': []}
# below is a list-comprehension to recursively parse the thing.
d = {k: recursive_parse(v, deepcopy(levels)) for k, v in d.items()}
return pd.concat(d)
def json_to_df(js, headers):
"calls recursive_parse, and then adds the column names and whatnot"
df = recursive_parse(js, deepcopy(headers))
df.index.names = headers[:-1]
df = df.reset_index()
return df
df = json_to_df(js, hdrs)
display(df)
And the output is exactly the dataframe you want (but with an index column you may not want).而 output 正是您想要的 dataframe (但您可能不想要索引列)。 If you write it to csv after, do so like this:如果之后将其写入 csv,请执行以下操作:
df.to_csv('path/to/desired/file.csv', index=False)
Does that make sense?那有意义吗?
Better version (not using pandas)...更好的版本(不使用熊猫)...
import csv
hdrs = ['Section', 'Subsection', 'pId', 'Group', 'Parameter', 'Value']
js = [{"a": [{"a1": [{"id0": [{"aa": [{"aaa": 97}, {"aab": "one"}],
"ab": [{"aba": 98}, {"abb": ["one", "two"]}]}]},
{"id1": [{"aa": [{"aaa": 23}]}]}
]},
{"a2": []}
]},
{"b": [{"b1": [{"Common": [{"bb": [{"value": 4}]}]}]}]}]
def list_of_dicts_to_lists(lst, n_levels=len(hdrs)):
if n_levels == 1:
if isinstance(lst, list):
if len(lst) == 0: # we fill the shorter ones with empty lists
lst = None # replacing them back to None
else: # [1, 2] => "1,2"
lst = ','.join(str(x) for x in lst if x is not None)
return [[lst]] # the later ones are going to be lists of lists so let's start out that way to keep everything consistent.
if len(lst) == 0:
lst = [{None: []}] # filling with an empty list
output = []
for d in lst:
for k, v in d.items():
tmp = list_of_dicts_to_lists(v, n_levels - 1)
for x in tmp:
output.append([k] + x)
return output
def to_csv(values, header, outfile):
with open(outfile, 'w', newline='') as csv_file:
# pretty much straight from the docs @
# https://docs.python.org/3.7/library/csv.html
csv_writer = csv.writer(csv_file, quoting=csv.QUOTE_MINIMAL)
csv_writer.writerow(header)
for line in values:
csv_writer.writerow(line)
return True
rows = list_of_dicts_to_lists(js)
to_csv(rows, hdrs, 'tmp.csv')
I see now that this solution is super similar to the other answer here... My bad.我现在看到这个解决方案与这里的其他答案非常相似......我的错。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.