简体   繁体   English

Python 网页抓取“IndexError: list index out of range”

[英]Python web scraping “IndexError: list index out of range”

from selenium import webdriver
import csv
import requests
from bs4 import BeautifulSoup

driver = webdriver.Chrome(executable_path="C:\\Users\\dylan\\Documents\\chromedriver.exe")
data_list=[]


site = requests.get('https://www.visitnh.gov/things-to-do/food-and-drink/restaurants')

if site.status_code is 200:
   content = BeautifulSoup(site.content, 'html.parser')
   Resultswrapper = content.find_all('div', attrs={'results-wrapper'})

for Results in Resultswrapper:
    print("Random phrase: ")
    #print(Results.select(class_='results-wrapper').prettify())
    BusinessName = Results.select('.item-title.ng-binding')[0].get_text()
    Address =   Results.select('.ng-binding')[0].get_text()
    PhoneNumber = Results.select('.ng-binding')[0].get_text()
    new_data = {"BusinessName": BusinessName, "Address": Address, "PhoneNumber": PhoneNumber}
    data_list.append(new_data)
    print("data: " + data_list)
    print("new data: " + new_data)

with open ('find.csv','w') as file:
        writer = csv.DictWriter(file, fieldnames = ["BusinessName", "Address", "PhoneNumber"], delimiter = ';')
        writer.writeheader()
        for row in data_list:
            writer.writerow(row)

I get an index out of range error when trying to make a simple web scraper and I'm trying to loop using a selector.尝试制作简单的网页抓取工具时出现索引超出范围错误,并且我正在尝试使用选择器进行循环。 I use this URL:https://www.visitnh.gov/things-to-do/food-and-drink/restaurants我使用这个网址:https ://www.visitnh.gov/things-to-do/food-and-drink/restaurants

Traceback (most recent call last):
File "c:/Users/dylan/Documents/Webscrape/web-s.py", line 19, in <module>
BusinessName = Results.select('p.item-title.ng-binding')[0].get_text()
IndexError: list index out of range

I tried change the result wrapper to something different in the HTMl but its not the same.我尝试在 HTMl 中将结果包装器更改为不同的东西,但它不一样。 I also tried messing around with the text in the select, but no use.我也尝试在选择中处理文本,但没有用。 Any ideas?有任何想法吗? Any help would be greatly appreciated.任何帮助将不胜感激。

Looking at the website, you can see that the element you are trying to select is not present when the document is first ready.查看网站,您可以看到当文档首次准备好时,您尝试选择的元素并不存在。 The list of restaurants on this website is inserted into the page later, which is why your script cannot find it right away.此网站上的餐厅列表稍后会插入到页面中,这就是您的脚本无法立即找到它的原因。

Maybe wait a bit before selecting with the built in time.sleep function.也许在使用内置的time.sleep功能进行选择之前稍等片刻。

The data is loaded from external URL via javascript.数据是通过 javascript 从外部 URL 加载的。 You can use requests / json module to simulate this request:你可以使用requests / json模块来模拟这个请求:

import json
import requests


url = 'https://www.visitnh.gov/BusinessListingService.asmx/GetUsers'
search_params = {"searchParams":{"ResultsPerPage":15,"BusinessType":"attraction","SubcategoryID":25}}
data = json.loads(requests.post(url, json=search_params).json()['d'])

# uncomment this to print all data:
# print(json.dumps(data, indent=4))

for result in data['Results']:
    print(result['BusinessName'])
    print(result['Address'])
    print(result['Phone'])
    print('-' * 80)

Prints:印刷:

NazBar & Grill
1086 Weirs Blvd, Laconia, NH 03246
(603) 366-4341
--------------------------------------------------------------------------------
Giant of Siam
5 E Hollis Street, Nashua, NH 03060
(603) 595-2222
--------------------------------------------------------------------------------
Hobbs Tavern & Brewing Company
2415 NH Route 16, Ossipee, NH 03890
(603) 539-2000
--------------------------------------------------------------------------------

...and so on. (Total 164 items.)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 IndexError:列表索引超出范围(Python 网页抓取) - IndexError: list index out of range (Python web scraping) Web抓取Python:IndexError:列表索引超出范围 - Web scraping python: IndexError: list index out of range Web抓取:IndexError:列表索引超出范围 - Web scraping: IndexError: list index out of range 当代码缺失值时,如何修复Web抓取Python代码“ IndexError:列表索引超出范围” - How to fix web scraping Python code “IndexError: list index out of range” when the code hits missing values Python 抓取数据 - IndexError:列表索引超出范围 - Python scraping data - IndexError: list index out of range 刮表:IndexError: list index out of range - Scraping a table: IndexError: list index out of range IndexError:字符串索引超出范围[python,抓取] - IndexError: string index out of range [python, scraping] IndexError:列表索引超出范围(使用 python 的网络爬虫) - IndexError: list index out of range (web scraper using python) Python Web 抓取错误 - 从 JSON 读取 - IndexError:列表索引超出范围 - 我该如何忽略 - Python Web Scraping error - Reading from JSON- IndexError: list index out of range - how do I ignore IndexError:列表索引超出python列表的范围 - IndexError: list index out of range in python list
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM