[英]Loading JSON data into pandas data frame and creating custom columns
Here is example JSON im working with. 这是使用JSON即时通讯的示例。
{
":@computed_region_amqz_jbr4": "587",
":@computed_region_d3gw_znnf": "18",
":@computed_region_nmsq_hqvv": "55",
":@computed_region_r6rf_p9et": "36",
":@computed_region_rayf_jjgk": "295",
"arrests": "1",
"county_code": "44",
"county_code_text": "44",
"county_name": "Mifflin",
"fips_county_code": "087",
"fips_state_code": "42",
"incident_count": "1",
"lat_long": {
"type": "Point",
"coordinates": [
-77.620031,
40.612749
]
}
I have been able to pull out select columns I want except I'm having troubles with "lat_long". 除了遇到“ lat_long”的麻烦之外,我已经能够提取想要的选择列。 So far my code looks like:
到目前为止,我的代码如下:
# PRINTS OUT SPECIFIED COLUMNS
col_titles = ['county_name', 'incident_count', 'lat_long']
df = df.reindex(columns=col_titles)
However 'lat_long' is added to the data frame as such: {'type': 'Point', 'coordinates': [-75.71107, 4...
但是,“ lat_long”会这样添加到数据帧中:
{'type': 'Point', 'coordinates': [-75.71107, 4...
I thought once I figured out how properly add the coordinates to the data frame I would then create two seperate columns, one for latitude and one for longitude. 我想过,一旦弄清楚如何正确地将坐标添加到数据框中,就可以创建两个单独的列,一个用于纬度,一个用于经度。
Any help with this matter would be appreciated. 任何与此问题的帮助将不胜感激。 Thank you.
谢谢。
If I don't misunderstood your requirements then you can try this way with json_normalize . 如果我没有误解您的要求,那么您可以尝试使用json_normalize这样 。 I just added the demo for single json, you can use
apply
or lambda
for multiple datasets. 我只是为单个json添加了演示,您可以将
apply
或lambda
用于多个数据集。
import pandas as pd
from pandas.io.json import json_normalize
df = {":@computed_region_amqz_jbr4":"587",":@computed_region_d3gw_znnf":"18",":@computed_region_nmsq_hqvv":"55",":@computed_region_r6rf_p9et":"36",":@computed_region_rayf_jjgk":"295","arrests":"1","county_code":"44","county_code_text":"44","county_name":"Mifflin","fips_county_code":"087","fips_state_code":"42","incident_count":"1","lat_long":{"type":"Point","coordinates":[-77.620031,40.612749]}}
df = pd.io.json.json_normalize(df)
df_modified = df[['county_name', 'incident_count', 'lat_long.type']]
df_modified['lat'] = df['lat_long.coordinates'][0][0]
df_modified['lng'] = df['lat_long.coordinates'][0][1]
print(df_modified)
Here is how you can do it as well: 您也可以按照以下方法进行操作:
df1 = pd.io.json.json_normalize(df)
pd.concat([df1, df1['lat_long.coordinates'].apply(pd.Series) \
.rename(columns={0: 'lat', 1: 'long'})], axis=1) \
.drop(columns=['lat_long.coordinates', 'lat_long.type'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.