简体   繁体   中英

Pandas - convert json format string out of Dataframe

This question can be complicated:-) and I wasn't able to find the answer for hours....

I have json type data from one of columns within a dataframe...

population  postcode    salesGrowthList
0   3507    2250    [{'medianSoldPrice': 300000.0, 'annualGrowth':...
1   3507    2250    [{'medianSoldPrice': 353000.0, 'annualGrowth':...
2   3507    2250    [{'medianSoldPrice': 0.0, 'annualGrowth': 0.0,...
3   3507    2250    [{'medianSoldPrice': 0.0, 'annualGrowth': 0.0,...

sameple out of 'salesGrowthList' is like below... it's a string format but it is Json structured string..

"[{'medianSoldPrice': 300000.0, 'annualGrowth': 0.0, 'numberSold': 19, 'year': 2014}, {'medianSoldPrice': 347000.0, 'annualGrowth': 0.15666666666666668, 'numberSold': 27, 'year': 2015}, {'medianSoldPrice': 371000.0, 'annualGrowth': 0.069164265129683, 'numberSold': 12, 'year': 2016}, {'medianSoldPrice': 410000.0, 'annualGrowth': 0.10512129380053908, 'numberSold': 15, 'year': 2017}, {'medianSoldPrice': 0.0, 'annualGrowth': 0.0, 'numberSold': 6, 'year': 2018}, {'medianSoldPrice': 411000.0, 'annualGrowth': 0.0, 'numberSold': 10, 'year': 2019}]"

Now I would like to build a new dataframe out of this output, how can this be done?

You could load json string using json and then give it to the pandas.DataFrame like,

>>> import json
>>> import pandas as pd
>>> x
"[{'medianSoldPrice': 300000.0, 'annualGrowth': 0.0, 'numberSold': 19, 'year': 2014}, {'medianSoldPrice': 347000.0, 'annualGrowth': 0.15666666666666668, 'numberSold': 27, 'year': 2015}, {'medianSoldPrice': 371000.0, 'annualGrowth': 0.069164265129683, 'numberSold': 12, 'year': 2016}, {'medianSoldPrice': 410000.0, 'annualGrowth': 0.10512129380053908, 'numberSold': 15, 'year': 2017}, {'medianSoldPrice': 0.0, 'annualGrowth': 0.0, 'numberSold': 6, 'year': 2018}, {'medianSoldPrice': 411000.0, 'annualGrowth': 0.0, 'numberSold': 10, 'year': 2019}]"
>>> d = json.loads(x.replace("'", '"'))
>>> df = pd.DataFrame(d)
>>> df
   medianSoldPrice  annualGrowth  numberSold  year
0         300000.0      0.000000          19  2014
1         347000.0      0.156667          27  2015
2         371000.0      0.069164          12  2016
3         410000.0      0.105121          15  2017
4              0.0      0.000000           6  2018
5         411000.0      0.000000          10  2019
>>> 

and then maybe, add the column to the original dataframe like,

>>> orig_df['salesGrowthList'] = df
>>> 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM