I'm using the following to parse data from a website:
import requests
import pandas as pd
resp = requests.get("https://thisiscriminal.com/wp-json/criminal/v1/episodes?posts=1000000&page=1").json()
df = pd.DataFrame(resp['posts'], columns=['episodeNumber','slug','image','excerpt','audioSource'])
df.to_csv("output9.csv", encoding='utf-8', index='false')
data = pd.read_csv("output9.csv")
As you can see, I've had to pull the entire 'excerpt' column which pulls all three instead of just one. How would I go about just pulling say the 'short' one? What is the heading called instead of 'column'? Also, the 'title' doesn't seem to be under any sort of header - how would I pull this too?
A quick visual of the .json is here if it helps: https://www.dropbox.com/s/v9l81ber6i4nbgw/11111111.jpg?dl=0
Any help would be greatly appreciated.
The workaround which I can think of is to normalizes the resp['posts'] json and dont mention the columns. Below is the code to generate the above dataframe:
import requests
import pandas as pd
from pandas.io.json import json_normalize
resp = requests.get("https://thisiscriminal.com/wp-json/criminal/v1/episodes?posts=1000000&page=1").json()
# print(resp['posts'][0])
df = pd.DataFrame(json_normalize(resp['posts']))
df.to_csv("output2_9.csv", encoding='utf-8', index='false')
Now once you have this dataframe u can filter which ever column you want it has all the field of json and column names as : audioSource content date episodeNumber excerpt.full excerpt.long excerpt.short id image.full image.large image.medium image.thumb musicCredits next next.slug next.title permalink prev prev.slug prev.title slug title
The title header is also present in this dataframe
I've taken the excerpt
series, called the apply
function and took the 'short' series which was created from apply
. You might have to handle the extra double quotes, consider the following code:
import requests
import pandas as pd
resp = requests.get("https://thisiscriminal.com/wp-json/criminal/v1/episodes?posts=1000000&page=1").json()
df = pd.DataFrame(resp['posts'], columns=['episodeNumber','slug','image','excerpt','audioSource'])
df['excerpt'] = df['excerpt'].apply(pd.Series)['short']#.replace({'"': '\'','""': '\'','"""': '\'' }, regex=True)
df.to_csv("output9.csv", encoding='utf-8', index='false')
data = pd.read_csv("output9.csv")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.