简体   繁体   English

将 JSON 链路转换为 Pandas DataFrame

[英]Converting a JSON Link into a Pandas DataFrame

Please look at the following explanation for the problem.请看下面的问题解释。 I have a JSON Data Source: https://data.cdc.gov/api/views/x8jf-txib/rows.json and I want to convert this Data into a Pandas Data frame. I have a JSON Data Source: https://data.cdc.gov/api/views/x8jf-txib/rows.json and I want to convert this Data into a Pandas Data frame.

If you look at the JSON Dataset, it consists of MetaData and then the Actual Data.如果您查看 JSON 数据集,它由元数据和实际数据组成。 I would like to have a way in which I can store Metadata in a different file while the Dataset in a different file in my local System.我想有一种方法可以将元数据存储在不同的文件中,而将数据集存储在本地系统的不同文件中。

I have developed this method and I am not able to get it completely work for me:我已经开发了这种方法,但我无法让它完全适合我:

from urllib.request import urlopen
import json
​
# Get the dataset
url = "https://data.cdc.gov/api/views/x8jf-txib/rows.json"
response = urlopen(url)
​
# Convert bytes to string type and string type to dict
string = response.read().decode('utf-8')
json_obj = json.loads(string)

The above Step converts the JSON File in a Dictionary and when I try to convert it into Pandas Dataframe by using this:上述步骤将 JSON 文件转换为字典,当我尝试将其转换为 Pandas Dataframe 时:

pd.DataFrame([json_obj.items()])

I get the output as this:我得到 output 如下:

在此处输入图像描述

Please help me for this.请帮助我。 I appreciate it.我很感激。

In Python, json.loads gives you back a map/object if the JSON string was parsed properly.在 Python 中,如果 JSON 字符串被正确解析, json.loads会返回一个映射/对象。 I think what you want to construct the DataFrame is the following:我认为您要构建DataFrame的内容如下:

df = pd.DataFrame.from_records(json_obj['data'])

Here's a working script:这是一个工作脚本:

import pandas as pd
from urllib.request import urlopen
import json

# Get the dataset
url = "https://data.cdc.gov/api/views/x8jf-txib/rows.json"
response = urlopen(url)

# Convert bytes to string type and string type to dict
string = response.read().decode('utf-8')
json_obj = json.loads(string)

df = pd.DataFrame.from_records(json_obj['data'])
print(df.head())

You should get output that looks something like:你应该得到 output 看起来像:

                   0                                     1   2   ...                            38    39    40
0  row-ss5i~ibqh-im6e  00000000-0000-0000-E6C3-33C094361E41   0  ...                          None  None  None
1  row-7jrs-n8wf_crzs  00000000-0000-0000-22EC-13B75E5E7127   0  ...                          None  None  None
2  row-ddqq-yzd7.yyhz  00000000-0000-0000-319D-A1D4FB17A377   0  ...                          None  None  None
3  row-kzem-t4xs.n4ss  00000000-0000-0000-6ED5-CF3857CC1862   0  ...                          None  None  None
4  row-9ws9-2nrx~xqqg  00000000-0000-0000-3403-E46EFF15AE5B   0  ...  POINT (-89.148632 40.124144)  1721    34

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM