简体   繁体   中英

python to get the json data from multiple links

I am trying to extract the JSON data from multiple links, but looks like I am doing something wrong. I am getting only the last id details. How do I get the JSON data for all the links? Also, is it possible to export all the results to a CSV file?

Please kindly guide me.

Here is the code that I am using.

import json
import requests
from bs4 import BeautifulSoup

a_list = [234147,234548,232439,234599,226672,234117,222388]
a_url = 'https://jobs.mycareerportal/careers-home/jobs'
urls = []

for n in a_list:
    kurl = '{}/{}'.format(a_url, n)
  
soup = BeautifulSoup(requests.get(kurl).content, "html.parser")
data = [
    json.loads(x.string) for x in soup.find_all("script", type="application/ld+json")
      ]
for d in data:
  k = str(d['url']) + str(d['jobLocation']['address'])
urls.append(kurl) 

print(k)

and this is the output that I am getting

PS E:\Python> & C:/Users/KristyG/Anaconda3/python.exe e:/Python/url_append.py
https://jobs.mycareerportal/careers-home/jobs/222388?{'@type': 'PostalAddress', 'addressLocality': 'Panama City', 'addressRegion': 'Florida', 'streetAddress': '4121 Hwy 98', 'postalCode': '32401-1170', 'addressCountry': 'United States'}
PS E:\Python>

Please note, I had to change the website name as I can't share it on public

I guess its just an indentation problem, try nesting the code inside the first for loop like this:

import json
import requests
from bs4 import BeautifulSoup

a_list = [234147,234548,232439,234599,226672,234117,222388]
a_url = 'https://jobs.mycareerportal/careers-home/jobs'
urls = []

for n in a_list:
    kurl = '{}/{}'.format(a_url, n)
  
    soup = BeautifulSoup(requests.get(kurl).content, "html.parser")
    data = [
        json.loads(x.string) for x in soup.find_all("script", type="application/ld+json")
        ]
    for d in data:
        k = str(d['url']) + str(d['jobLocation']['address'])
    urls.append(kurl) 

    print(k)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM