简体   繁体   中英

Web Scraping Error: Request and No Connection

I am trying to write my first program in python. The intent of the web scraping program is to pull prices from potentially 100 or more websites for multiple types of products. I was able to write the program for one website and have it exported to an excel file with no issue. However, I am now having issues when trying to web scrape multiple sites.

I am trying to place more than one URL into a list, then create a for loop to run the same code for each URL. Below is the code:

import pandas as pd
import requests
from bs4 import BeautifulSoup

#Aero Stripped Lowers
url = ['https://www.aeroprecisionusa.com/ar15/lower-receivers/stripped-lowers?product_list_limit=all', 'https://www.aeroprecisionusa.com/ar15/lower-receivers/complete-lowers?product_list_limit=all']
for website in url:
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0) Gecko/20100101 Firefox/71.0"}
    page = requests.get(url, headers=headers)
    soup = BeautifulSoup(page.content, 'html.parser')

#Locating All Stripped Aero Lowers On Site
all_aero_stripped_lowers = soup.find(class_='products wrapper container grid products-grid')
items = all_aero_stripped_lowers.find_all(class_='product-item-info')

#Identifying All Aero Stipped Lower Names And Prices
aero_stripped_lower_names = [item.find(class_='product-item-link').text for item in items]
aero_stripped_lower_prices = [item.find(class_='price').text for item in items]


Aero_Stripped_Lowers_Consolidated = pd.DataFrame(
    {'Aero Stripped Lower': aero_stripped_lower_names,
     'Prices': aero_stripped_lower_prices,
     })

Aero_Stripped_Lowers_Consolidated.to_csv('MasterPriceTracker.csv')

I am receiving the error below:

Traceback (most recent call last):
  File "C:/Users/ComputerName/Documents/PyCharm_Projects/Aero Stripped Lower List/NewAeroStrippedLower.py", line 9, in <module>
    page = requests.get(url, headers=headers)
  File "C:\Python\Python38\lib\site-packages\requests\api.py", line 75, in get
    return request('get', url, params=params, **kwargs)
  File "C:\Python\Python38\lib\site-packages\requests\api.py", line 60, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Python\Python38\lib\site-packages\requests\sessions.py", line 533, in request
    resp = self.send(prep, **send_kwargs)
  File "C:\Python\Python38\lib\site-packages\requests\sessions.py", line 640, in send
    adapter = self.get_adapter(url=request.url)
  File "C:\Python\Python38\lib\site-packages\requests\sessions.py", line 731, in get_adapter
    raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for '['https://www.aeroprecisionusa.com/ar15/lower-receivers/stripped-lowers?product_list_limit=all', 'https://www.aeroprecisionusa.com/ar15/lower-receivers/complete-lowers?product_list_limit=all']'

Thanks in advance for any help you may be able to provide!

You're using requests.get() on a list. It's a simple mistake:

# -- snip --

for website in url:
    headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0) Gecko/20100101 Firefox/71.0"}
    page = requests.get(website, headers=headers) # not 'url'
    soup = BeautifulSoup(page.content, 'html.parser')

# -- snip --

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM