[英]Python web scraping “IndexError: list index out of range”
from selenium import webdriver
import csv
import requests
from bs4 import BeautifulSoup
driver = webdriver.Chrome(executable_path="C:\\Users\\dylan\\Documents\\chromedriver.exe")
data_list=[]
site = requests.get('https://www.visitnh.gov/things-to-do/food-and-drink/restaurants')
if site.status_code is 200:
content = BeautifulSoup(site.content, 'html.parser')
Resultswrapper = content.find_all('div', attrs={'results-wrapper'})
for Results in Resultswrapper:
print("Random phrase: ")
#print(Results.select(class_='results-wrapper').prettify())
BusinessName = Results.select('.item-title.ng-binding')[0].get_text()
Address = Results.select('.ng-binding')[0].get_text()
PhoneNumber = Results.select('.ng-binding')[0].get_text()
new_data = {"BusinessName": BusinessName, "Address": Address, "PhoneNumber": PhoneNumber}
data_list.append(new_data)
print("data: " + data_list)
print("new data: " + new_data)
with open ('find.csv','w') as file:
writer = csv.DictWriter(file, fieldnames = ["BusinessName", "Address", "PhoneNumber"], delimiter = ';')
writer.writeheader()
for row in data_list:
writer.writerow(row)
I get an index out of range error when trying to make a simple web scraper and I'm trying to loop using a selector.尝试制作简单的网页抓取工具时出现索引超出范围错误,并且我正在尝试使用选择器进行循环。 I use this URL:https://www.visitnh.gov/things-to-do/food-and-drink/restaurants
我使用这个网址:https ://www.visitnh.gov/things-to-do/food-and-drink/restaurants
Traceback (most recent call last):
File "c:/Users/dylan/Documents/Webscrape/web-s.py", line 19, in <module>
BusinessName = Results.select('p.item-title.ng-binding')[0].get_text()
IndexError: list index out of range
I tried change the result wrapper to something different in the HTMl but its not the same.我尝试在 HTMl 中将结果包装器更改为不同的东西,但它不一样。 I also tried messing around with the text in the select, but no use.
我也尝试在选择中处理文本,但没有用。 Any ideas?
有任何想法吗? Any help would be greatly appreciated.
任何帮助将不胜感激。
Looking at the website, you can see that the element you are trying to select is not present when the document is first ready.查看网站,您可以看到当文档首次准备好时,您尝试选择的元素并不存在。 The list of restaurants on this website is inserted into the page later, which is why your script cannot find it right away.
此网站上的餐厅列表稍后会插入到页面中,这就是您的脚本无法立即找到它的原因。
Maybe wait a bit before selecting with the built in time.sleep
function.也许在使用内置的
time.sleep
功能进行选择之前稍等片刻。
The data is loaded from external URL via javascript.数据是通过 javascript 从外部 URL 加载的。 You can use
requests
/ json
module to simulate this request:你可以使用
requests
/ json
模块来模拟这个请求:
import json
import requests
url = 'https://www.visitnh.gov/BusinessListingService.asmx/GetUsers'
search_params = {"searchParams":{"ResultsPerPage":15,"BusinessType":"attraction","SubcategoryID":25}}
data = json.loads(requests.post(url, json=search_params).json()['d'])
# uncomment this to print all data:
# print(json.dumps(data, indent=4))
for result in data['Results']:
print(result['BusinessName'])
print(result['Address'])
print(result['Phone'])
print('-' * 80)
Prints:印刷:
NazBar & Grill
1086 Weirs Blvd, Laconia, NH 03246
(603) 366-4341
--------------------------------------------------------------------------------
Giant of Siam
5 E Hollis Street, Nashua, NH 03060
(603) 595-2222
--------------------------------------------------------------------------------
Hobbs Tavern & Brewing Company
2415 NH Route 16, Ossipee, NH 03890
(603) 539-2000
--------------------------------------------------------------------------------
...and so on. (Total 164 items.)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.