Python 网页抓取“IndexError: list index out of range”

Question

from selenium import webdriver
import csv
import requests
from bs4 import BeautifulSoup

driver = webdriver.Chrome(executable_path="C:\\Users\\dylan\\Documents\\chromedriver.exe")
data_list=[]


site = requests.get('https://www.visitnh.gov/things-to-do/food-and-drink/restaurants')

if site.status_code is 200:
   content = BeautifulSoup(site.content, 'html.parser')
   Resultswrapper = content.find_all('div', attrs={'results-wrapper'})

for Results in Resultswrapper:
    print("Random phrase: ")
    #print(Results.select(class_='results-wrapper').prettify())
    BusinessName = Results.select('.item-title.ng-binding')[0].get_text()
    Address =   Results.select('.ng-binding')[0].get_text()
    PhoneNumber = Results.select('.ng-binding')[0].get_text()
    new_data = {"BusinessName": BusinessName, "Address": Address, "PhoneNumber": PhoneNumber}
    data_list.append(new_data)
    print("data: " + data_list)
    print("new data: " + new_data)

with open ('find.csv','w') as file:
        writer = csv.DictWriter(file, fieldnames = ["BusinessName", "Address", "PhoneNumber"], delimiter = ';')
        writer.writeheader()
        for row in data_list:
            writer.writerow(row)

I get an index out of range error when trying to make a simple web scraper and I'm trying to loop using a selector.尝试制作简单的网页抓取工具时出现索引超出范围错误，并且我正在尝试使用选择器进行循环。 I use this URL:https://www.visitnh.gov/things-to-do/food-and-drink/restaurants我使用这个网址：https ://www.visitnh.gov/things-to-do/food-and-drink/restaurants

Traceback (most recent call last):
File "c:/Users/dylan/Documents/Webscrape/web-s.py", line 19, in <module>
BusinessName = Results.select('p.item-title.ng-binding')[0].get_text()
IndexError: list index out of range

I tried change the result wrapper to something different in the HTMl but its not the same.我尝试在 HTMl 中将结果包装器更改为不同的东西，但它不一样。 I also tried messing around with the text in the select, but no use.我也尝试在选择中处理文本，但没有用。 Any ideas?有任何想法吗？ Any help would be greatly appreciated.任何帮助将不胜感激。

Answer 1

Looking at the website, you can see that the element you are trying to select is not present when the document is first ready.查看网站，您可以看到当文档首次准备好时，您尝试选择的元素并不存在。 The list of restaurants on this website is inserted into the page later, which is why your script cannot find it right away.此网站上的餐厅列表稍后会插入到页面中，这就是您的脚本无法立即找到它的原因。

Maybe wait a bit before selecting with the built in time.sleep function.也许在使用内置的time.sleep功能进行选择之前稍等片刻。

Answer 2

The data is loaded from external URL via javascript.数据是通过 javascript 从外部 URL 加载的。 You can use requests / json module to simulate this request:你可以使用requests / json模块来模拟这个请求：

import json
import requests


url = 'https://www.visitnh.gov/BusinessListingService.asmx/GetUsers'
search_params = {"searchParams":{"ResultsPerPage":15,"BusinessType":"attraction","SubcategoryID":25}}
data = json.loads(requests.post(url, json=search_params).json()['d'])

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

for result in data['Results']:
    print(result['BusinessName'])
    print(result['Address'])
    print(result['Phone'])
    print('-' * 80)

Prints:印刷：

NazBar & Grill
1086 Weirs Blvd, Laconia, NH 03246
(603) 366-4341
--------------------------------------------------------------------------------
Giant of Siam
5 E Hollis Street, Nashua, NH 03060
(603) 595-2222
--------------------------------------------------------------------------------
Hobbs Tavern & Brewing Company
2415 NH Route 16, Ossipee, NH 03890
(603) 539-2000
--------------------------------------------------------------------------------

...and so on. (Total 164 items.)

Python 网页抓取“IndexError: list index out of range”

问题描述

2 个解决方案

解决方案1
0 2020-08-31 23:00:12

解决方案2
0 已采纳 2020-08-31 23:06:27

Python 网页抓取“IndexError: list index out of range”

问题描述

2 个解决方案

解决方案1 0 2020-08-31 23:00:12

解决方案2 0 已采纳 2020-08-31 23:06:27

解决方案1
0 2020-08-31 23:00:12

解决方案2
0 已采纳 2020-08-31 23:06:27