简体   繁体   中英

Extraction from data from pandas json file

I have the following json data file which I have converted to pandas dataframe. The columns are as follows

Index(['id', 'title', 'abstract', 'content', 'metadata'], dtype='object')

I am particularly interested in the column 'metadata' an element of the column looks like

df_json.loc[78, 'metadata']
"{'classification': {'name': 'Manufacturing, Transport & Logistics'}, 'subClassification': {'name': 'Warehousing, Storage & Distribution'}, 'area': {'name': 'Southern Suburbs & Logan'}, 'location': {'name': 'Brisbane'}, 'suburb': {'name': 'Milton'}, 'workType': {'name': 'Casual/Vacation'}}"

So I want to make columns extracting the information out of 'metadata' columns for example location. I am not sure how to extract it and put it beside the same json file with added columns such as location etc.

    id  title   abstract    content metadata    clean_content
0   38915469    Recruitment Consultant  We are looking for someone to focus purely on ...   <HTML><p>Are you looking to join a thriving bu...   {'standout': {'bullet1': 'Join a Sector that i...   Are you looking to join a thriving business th...
1   38934839    Computers Salesperson - Coburg  Passionate about exceptional customer service?...   <HTML><p>&middot;&nbsp;&nbsp;Casual hours as r...   {'additionalSalaryText': 'Attractive Commissio...   middotnbspnbspCasual hours as required transit...
2   38946054    Senior Developer | SA   Readifarians are known for discovering the lat...   <HTML><p>Readify helps organizations 

 you can use pandas.json_normalize 

Applying on your string

 pd.json_normalize(eval(json_string)) 

 #o/p

在此处输入图像描述

if this is work for you, than simply you can try

 df["metadata"].apply(lambda x: pd.json_normalize(eval(x)))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM