简体   繁体   中英

Loading JSON data into pandas data frame and creating custom columns

Here is example JSON im working with.

{
    ":@computed_region_amqz_jbr4": "587",
    ":@computed_region_d3gw_znnf": "18",
    ":@computed_region_nmsq_hqvv": "55",
    ":@computed_region_r6rf_p9et": "36",
    ":@computed_region_rayf_jjgk": "295",
    "arrests": "1",
    "county_code": "44",
    "county_code_text": "44",
    "county_name": "Mifflin",
    "fips_county_code": "087",
    "fips_state_code": "42",
    "incident_count": "1",
    "lat_long": {
      "type": "Point",
      "coordinates": [
        -77.620031,
        40.612749
      ]
    }

I have been able to pull out select columns I want except I'm having troubles with "lat_long". So far my code looks like:

# PRINTS OUT SPECIFIED COLUMNS
col_titles = ['county_name', 'incident_count', 'lat_long']
df = df.reindex(columns=col_titles)

However 'lat_long' is added to the data frame as such: {'type': 'Point', 'coordinates': [-75.71107, 4...

I thought once I figured out how properly add the coordinates to the data frame I would then create two seperate columns, one for latitude and one for longitude.

Any help with this matter would be appreciated. Thank you.

If I don't misunderstood your requirements then you can try this way with json_normalize . I just added the demo for single json, you can use apply or lambda for multiple datasets.

import pandas as pd
from pandas.io.json import json_normalize
df = {":@computed_region_amqz_jbr4":"587",":@computed_region_d3gw_znnf":"18",":@computed_region_nmsq_hqvv":"55",":@computed_region_r6rf_p9et":"36",":@computed_region_rayf_jjgk":"295","arrests":"1","county_code":"44","county_code_text":"44","county_name":"Mifflin","fips_county_code":"087","fips_state_code":"42","incident_count":"1","lat_long":{"type":"Point","coordinates":[-77.620031,40.612749]}}

df = pd.io.json.json_normalize(df)
df_modified = df[['county_name', 'incident_count', 'lat_long.type']] 
df_modified['lat'] = df['lat_long.coordinates'][0][0]
df_modified['lng'] = df['lat_long.coordinates'][0][1]
print(df_modified)

Here is how you can do it as well:

df1 = pd.io.json.json_normalize(df)

pd.concat([df1, df1['lat_long.coordinates'].apply(pd.Series) \
  .rename(columns={0: 'lat', 1: 'long'})], axis=1) \
  .drop(columns=['lat_long.coordinates', 'lat_long.type'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM