简体   繁体   中英

Unable to export particular data from a .json file from a website

I'm using the following to parse data from a website:

import requests
import pandas as pd

resp = requests.get("https://thisiscriminal.com/wp-json/criminal/v1/episodes?posts=1000000&page=1").json()
df = pd.DataFrame(resp['posts'], columns=['episodeNumber','slug','image','excerpt','audioSource'])    
df.to_csv("output9.csv", encoding='utf-8', index='false')

data = pd.read_csv("output9.csv")

As you can see, I've had to pull the entire 'excerpt' column which pulls all three instead of just one. How would I go about just pulling say the 'short' one? What is the heading called instead of 'column'? Also, the 'title' doesn't seem to be under any sort of header - how would I pull this too?

A quick visual of the .json is here if it helps: https://www.dropbox.com/s/v9l81ber6i4nbgw/11111111.jpg?dl=0

Any help would be greatly appreciated.

The workaround which I can think of is to normalizes the resp['posts'] json and dont mention the columns. Below is the code to generate the above dataframe:

    import requests
    import pandas as pd
    from pandas.io.json import json_normalize

    resp = requests.get("https://thisiscriminal.com/wp-json/criminal/v1/episodes?posts=1000000&page=1").json()
    # print(resp['posts'][0])
    df = pd.DataFrame(json_normalize(resp['posts']))
    df.to_csv("output2_9.csv", encoding='utf-8', index='false')

Now once you have this dataframe u can filter which ever column you want it has all the field of json and column names as : audioSource content date episodeNumber excerpt.full excerpt.long excerpt.short id image.full image.large image.medium image.thumb musicCredits next next.slug next.title permalink prev prev.slug prev.title slug title

The title header is also present in this dataframe

I've taken the excerpt series, called the apply function and took the 'short' series which was created from apply . You might have to handle the extra double quotes, consider the following code:

import requests
import pandas as pd

resp = requests.get("https://thisiscriminal.com/wp-json/criminal/v1/episodes?posts=1000000&page=1").json()
df = pd.DataFrame(resp['posts'], columns=['episodeNumber','slug','image','excerpt','audioSource'])    
df['excerpt'] = df['excerpt'].apply(pd.Series)['short']#.replace({'"': '\'','""': '\'','"""': '\'' }, regex=True)
df.to_csv("output9.csv", encoding='utf-8', index='false')
data = pd.read_csv("output9.csv")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM