I have a file that is of type.gz and inside I have JSON objects like:
input:
{ "name":"John", "age":21, "gender":"male" }
{ "name":"Mike", "age":29, "gender":"male" }
{ "name":"Tim", "age":20, "gender":"male" }
{ "name":"Kim", "age":39, "gender":"female" }
Note: Notice there are no commas at end of each JSON obj.
I use the following to save it to a dataframe:
import pandas as pd
data_location = 's3://myBucket/myFolder'
raw_json_data = pd.read_json(data_location, lines=True)
raw_json_data.head(2)
Question: I want to convert it to CSV, maybe like this:
expected output:
name, age, gender
John, 21, male
Mike, 29, male
Tim, 20, male
Kim, 39, female
I used this but that did not work to give expected output - am I missing something?
df=pd.read_json(raw_json_data)
df.to_csv('results.csv')
Firstly, you can create dataframe with a column of the dictionaries
import json
from io import StringIO
df = pd.read_csv(StringIO("""
{ "name":"John", "age":21, "gender":"male" }
{ "name":"Mike", "age":29, "gender":"male" }
{ "name":"Tim", "age":20, "gender":"male" }
{ "name":"Kim", "age":39, "gender":"female" }
"""), delimiter='|', header=None) # instead of StringIO part, you can have the path of input file
df
0
0 { "name":"John", "age":21, "gender":"male" }
1 { "name":"Mike", "age":29, "gender":"male" }
2 { "name":"Tim", "age":20, "gender":"male" }
3 { "name":"Kim", "age":39, "gender":"female" }
You can use json_normalize to convert individual dictionaries to dataframe
def func(x):
result = pd.json_normalize(json.loads(x.iloc[0]))
return result
result = df.apply(func, axis=1)
result
0 name age gender
0 John 21 male
1 name age gender
0 Mike 29 male
2 name age gender
0 Tim 20 male
3 name age gender
0 Kim 39 female
dtype: object
The above output would be series of dataframe and to convert it to a single dataframe you can do following
pd.concat([r for r in result], ignore_index=True)
name age gender
0 John 21 male
1 Mike 29 male
2 Tim 20 male
3 Kim 39 female
.gz
file, with a .json
file inside.pathlib
methods to read the file in, and then split the rows into a list
of strings
Path('test.json')
: 'test.json()'
can be the path to the file if it's in a different directory. strings
to dicts
with ast.literal_eval
import pandas as pd
from pathlib import Path
from ast import literal_eval
# read the file in using the pathlib methods
text = Path('test.json').read_text().split('\n')
# map the strings to dicts
text = map(literal_eval, text)
# load the list of dicts into a dataframe
df = pd.DataFrame(text)
# save to a csv
df.to_csv('results.csv', index=False)
.gz
filejson
module is problematic because the data is not a properly formed .json
file.import gzip
import pandas as pd
from ast import literal_eval
# open the gzip file
with gzip.open('testing.json.gz', 'rt', encoding='UTF-8') as zipfile:
data = [literal_eval(v.strip()) for v in zipfile]
# create the dataframe
df = pd.DataFrame(data)
# save to a csv
df.to_csv('results.csv', index=False)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.