I've tried json_normalize, and this seems to work; however, it does not print my desired output.
import requests
import json
from pandas.io.json import json_normalize
import pandas as pd
url = "https://www.qnt.io/api/results?pID=gifgif&mID=54a309ae1c61be23aba0da62&key=54a309ac1c61be23aba0da3f"
aResponse = requests.get(url)
y = json.loads(aResponse.content)
json_test = json.dumps(y, indent = 4, sort_keys=True)
print(json_test)
csv = json_normalize(y['results'])
print(csv)
Displaying the output of this code is difficult and extremely confusing; therefore, I think its in both of our best interests that I leave it out. If that is a useful piece of information, I can add it.
The json.dumps portion simply orgranizes my json file so that it is easily viewable. Unfortunately, I can't post the entire json file because Stack isn't a huge fan of my formatting. Here is a small snippet:
{
"query_parameters": {
"limit": 10,
"mID": "54a309ae1c61be23aba0da62",
"skip": 0,
"sort": 1
},
"results": [
{
"cID": "5314ab42d34b6c5b402aead4",
"content": "BE9kUwvLfsAmI",
"content_data": {
"added_with_admin": false,
"dateAdded": 1393863490.072894,
"embedLink": "http://media3.giphy.com/media/BE9kUwvLfsAmI/giphy.gif",
"still_image": "http://media.giphy.com/media/BE9kUwvLfsAmI/200_s.gif",
"tags": [
"adam levine",
"embarassed",
"the voice",
"confession"
]
},
"content_type": "gif",
"index": 269,
"parameters": {
"mu": 35.92818823777915,
"sigma": 1.88084276812386
},
"rank": 0
},
There is about 10 more of these (ranging all the way up to 6119; however, I'm trying to get just part of this working). I want my output to be ordered as such: rank, tags, embedLink, mu, sigma, index. Here is an example of my desired output:
0, adam levine, embarassed, the voice, confession, http://media3.giphy.com/media/BE9kUwvLfsAmI/giphy.gif, 35.92818823777915, 1.88084276812386, 269
I would like to have it as a csv file; however, I think creating a dataframe using Pandas could also be quite useful. I think my problem occurs because I have such a large, embedded json file, and it's hard for the computer to organize this large data-set. Any advice would be appreciated!
First, you can use requests.json() instead of requests.text
to get the response content as JSON.
import requests
import pandas as pd
from pprint import pprint
url = "https://www.qnt.io/api/results?pID=gifgif&mID=54a309ae1c61be23aba0da62&key=54a309ac1c61be23aba0da3f"
response = requests.get(url)
results = response.json()["results"]
# pprint(results)
[{'cID': '5314ab42d34b6c5b402aead4',
'content': 'BE9kUwvLfsAmI',
'content_data': {'added_with_admin': False,
'dateAdded': 1393863490.072894,
'embedLink': 'http://media3.giphy.com/media/BE9kUwvLfsAmI/giphy.gif',
'still_image': 'http://media.giphy.com/media/BE9kUwvLfsAmI/200_s.gif',
'tags': ['adam levine',
'embarassed',
'the voice',
'confession']},
'content_type': 'gif',
'index': 269,
'parameters': {'mu': 35.92818823777915, 'sigma': 1.88084276812386},
'rank': 0},
{'cID': '5314ab4dd34b6c5b402aeb97',
...
Then you can load the dict with pd.DataFrame.from_dict :
df = pd.DataFrame.from_dict(results)
# print(df.head(2))
cID content \
0 5314ab42d34b6c5b402aead4 BE9kUwvLfsAmI
1 5314ab4dd34b6c5b402aeb97 NZhO1SEuFmhj2
content_data content_type index \
0 {'embedLink': 'http://media3.giphy.com/media/B... gif 269
1 {'embedLink': 'http://media1.giphy.com/media/N... gif 464
parameters rank
0 {'mu': 35.92818823777915, 'sigma': 1.880842768... 0
1 {'mu': 35.70238333972232, 'sigma': 1.568292935... 1
And then use .apply(pd.Series)
to further expand the columns in dict:
df = pd.concat([df.drop(["content_data"], axis=1), df["content_data"].apply(pd.Series)], axis=1)
df = pd.concat([df.drop(["parameters"], axis=1), df["parameters"].apply(pd.Series)], axis=1)
# print(df.head(2))
cID content content_type index rank \
0 5314ab42d34b6c5b402aead4 BE9kUwvLfsAmI gif 269 0
1 5314ab4dd34b6c5b402aeb97 NZhO1SEuFmhj2 gif 464 1
added_with_admin dateAdded \
0 False 1.393863e+09
1 False 1.393864e+09
embedLink \
0 http://media3.giphy.com/media/BE9kUwvLfsAmI/gi...
1 http://media1.giphy.com/media/NZhO1SEuFmhj2/gi...
still_image \
0 http://media.giphy.com/media/BE9kUwvLfsAmI/200...
1 http://media.giphy.com/media/NZhO1SEuFmhj2/200...
tags mu sigma
0 [adam levine, embarassed, the voice, confession] 35.928188 1.880843
1 [ryan gosling, facepalm, embarrassed, confession] 35.702383 1.568293
And convert the tags from list to string:
df["tags"] = df["tags"].apply(lambda x: ", ".join(x))
# print(df.head(2)["tags"])
0 adam levine, embarassed, the voice, confession
1 ryan gosling, facepalm, embarrassed, confession
And get the columns you want finally:
df = df[["rank", "tags", "embedLink", "mu", "sigma", "index"]]
# print(df.head(2))
rank tags \
0 0 adam levine, embarassed, the voice, confession
1 1 ryan gosling, facepalm, embarrassed, confession
embedLink mu sigma \
0 http://media3.giphy.com/media/BE9kUwvLfsAmI/gi... 35.928188 1.880843
1 http://media1.giphy.com/media/NZhO1SEuFmhj2/gi... 35.702383 1.568293
index
0 269
1 464
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.