简体   繁体   中英

Export JSON to CSV using Python

I wrote a code to extract some information from a website. the output is in JSON and I want to export it to CSV. So, I tried to convert it to a pandas dataframe and then export it to CSV in pandas. I can print the results but still, it doesn't convert the file to a pandas dataframe. Do you know what the problem with my code is?

# -*- coding: utf-8 -*-
# To create http request/session 
import requests
import re, urllib
import pandas as pd
from BeautifulSoup import BeautifulSoup

url = "https://www.indeed.com/jobs? 
q=construction%20manager&l=Houston&start=10"

# create session
s = requests.session()
html = s.get(url).text

# exctract job IDs
job_ids = ','.join(re.findall(r"jobKeysWithInfo\['(.+?)'\]", html))
ajax_url = 'https://www.indeed.com/rpc/jobdescs?jks=' + 
urllib.quote(job_ids)
# do Ajax request and convert the response to json 
ajax_content = s.get(ajax_url).json()
print(ajax_content)
#Convert to pandas dataframe
df = pd.read_json(ajax_content)
#Export to CSV
df.to_csv("c:\\users\\Name\desktop\\newcsv.csv")

The error message is:

Traceback (most recent call last):

File "C:\\Users\\Mehrdad\\Desktop\\Indeed 06.py", line 21, in df = pd.read_json(ajax_content)

File "c:\\python27\\lib\\site-packages\\pandas\\io\\json\\json.py", line 408, in read_json path_or_buf, encoding=encoding, compression=compression,

File "c:\\python27\\lib\\site-packages\\pandas\\io\\common.py", line 218, in get_filepath_or_buffer raise ValueError(msg.format(_type=type(filepath_or_buffer)))

ValueError: Invalid file path or buffer object type:

The problem was that nothing was going into the dataframe when you called read_json() because it was a nested JSON dict:

import requests
import re, urllib
import pandas as pd
from pandas.io.json import json_normalize

url = "https://www.indeed.com/jobs?q=construction%20manager&l=Houston&start=10"

s = requests.session()
html = s.get(url).text

job_ids = ','.join(re.findall(r"jobKeysWithInfo\['(.+?)'\]", html))
ajax_url = 'https://www.indeed.com/rpc/jobdescs?jks=' + urllib.quote(job_ids)

ajax_content= s.get(ajax_url).json()
df = json_normalize(ajax_content).transpose()
df.to_csv('your_output_file.csv')

Note that I called json_normalize() to collapse the nested columns from the JSON. I also called transpose() so that the rows were labelled with the job ID rather than columns. This will give you a dataframe that looks like this:

0079ccae458b4dcf    <p><b>Company Environment: </b></p><p>Planet F...
0c1ab61fe31a5c62    <p><b>Commercial Construction Project Manager<...
0feac44386ddcf99    <div><div>Trendmaker Homes is currently seekin...
...

It's not really clear what your expected output is, though ... what are you expecting the DataFrame/CSV file to look like?. If you actually were looking for just a single row/Series with the job ID's as column labels, just remove the call to transpose()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM