简体   繁体   English

如何使用 Python、Selenium 和 BeautifulSoup 抓取 JSP?

[英]How do I web-scrape a JSP with Python, Selenium and BeautifulSoup?

I'm an absolute beginner experimenting web-scraping with Python.我是一个绝对的初学者,正在尝试使用 Python 进行网络抓取。 I'm trying to extract the location of ATMs from this URL:我正在尝试从此 URL 中提取 ATM 的位置:

https://www.visa.com/atmlocator/mobile/index.jsp#(page:results,params:(query:'Tokyo,%20Japan'))

using the following code.使用以下代码。

#Script to scrape locations and addresses from VISA's ATM locator


# import the necessary libraries (to be installed if not available):

from selenium import webdriver
from bs4 import BeautifulSoup
import pandas as pd


#ChromeDriver
#(see https://chromedriver.chromium.org/getting-started as reference)

driver = webdriver.Chrome("C:/Users/DefaultUser/Local Settings/Application Data/Google/Chrome/Application/chromedriver.exe")

offices=[] #List to branches/ATM names
addresses=[] #List to branches/ATM locations
driver.get("https://www.visa.com/atmlocator/mobile/index.jsp#(page:results,params:(query:'Tokyo,%20Japan'))") 


content = driver.page_source
soup = BeautifulSoup(content, features = "lxml")


#the following code extracts all the content inside the tags displaying the information requested

for a in soup.findAll('li',attrs={'class':'visaATMResultListItem'}): 
    name=a.find('li', attrs={'class':'data-label'}) 
    address=a.find('li', attrs={'class':'data-label'}) 
    offices.append(name.text)
    addresses.append(address.text)


#next row defines the dataframe with the results of the extraction

df = pd.DataFrame({'Office':offices,'Address':addresses})


#next row displays dataframe content

print(df)


#export data to .CSV file named 'branches.csv'
with open('branches.csv', 'a') as f:
    df.to_csv(f, header=True)

The script seems to work correctly, at first, since Chromedriver starts and shows the results as required in the browser, but no result is returned:该脚本起初似乎工作正常,因为 Chromedriver 启动并在浏览器中按要求显示结果,但没有返回任何结果:

Empty DataFrame
Columns: [Office, Address]
Index: []
Process finished with exit code 0

Maybe I made a mistake in choosing the selectors?也许我在选择选择器时犯了一个错误?

Thank you very much for your help非常感谢您的帮助

The problem is with the locators, use问题在于定位器,请使用

for a in soup.findAll('li',attrs={'class':'visaATMResultListItem'}): 
    name = a.find('p', attrs={'class':'visaATMPlaceName '}) 
    address = a.find('p', attrs={'class':'visaATMAddress'}) 
    offices.append(name.text)
    addresses.append(address.text)
from selenium import webdriver
from selenium.webdriver.firefox.options import Options
import time
from bs4 import BeautifulSoup
import csv

options = Options()
options.add_argument('--headless')

driver = webdriver.Firefox(options=options)
driver.get("https://www.visa.com/atmlocator/mobile/index.jsp#(page:results,params:(query:'Tokyo,%20JAPAN'))")
time.sleep(2)

soup = BeautifulSoup(driver.page_source, 'html.parser')

na = []
addr = []
for name in soup.findAll("a", {'class': 'visaATMPlaceLink'}):
    na.append(name.text)
for add in soup.findAll("p", {'class': 'visaATMAddress'}):
    addr.append(add.get_text(strip=True, separator=" "))

with open('out.csv', 'w', newline="") as f:
    writer = csv.writer(f)
    writer.writerow(['Name', 'Address'])
    for _na, _addr in zip(na, addr):
        writer.writerow([_na, _addr])

driver.quit()

Output: Click-Here输出: 点击这里

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在 for 循环中进行网络抓取,而不会丢失 DOM? (Python,硒) - How to web-scrape in for loop, without losing DOM? (Python, Selenium) 我如何使用 Beautiful Soup 对这个网站进行网络抓取 - How do I web-scrape this website using Beautiful Soup 如何有效地通过网络抓取这些异常值? - How do I efficiently web-scrape these abnormal values? 我如何通过以下链接从网页上刮取仅反映德克萨斯游骑兵季节的表格? 我正在使用BeautifulSoup4和html.parser - How do I web-scrape the table that just reflects Texas Rangers season from the following link? I am using BeautifulSoup4 and html.parser 当我从亚马逊网络抓取客户评论(使用 BeautifulSoup)时,为什么我会得到一个空列表? - Why do I get an empty list when I web-scrape customer reviews from Amazon(using BeautifulSoup)? 如何使用 python 从 google meet 中抓取实时字幕 - How can I web-scrape real time captions from google meet, Using python 当HTML没有改变时,我如何用Python进行网络浏览? - How can I web-scrape with Python when the HTML doesn't change? Python - BeautifulSoup 在频繁请求时无法通过网络抓取彭博信息 - Python - BeautifulSoup fails to web-scrape bloomberg info upon frequent requests 我怎样才能刮所有<li>来自带有 Python Selenium 和 BeautifulSoup 的网络的文本? - How can I scrape all <li> text from a web with Python Selenium and BeautifulSoup? Web 搜索后使用 Python、Selenium、ZC2ED0329D2D3CF54C78317B20D 进行搜索 - Web scrape after search with Python, Selenium, BeautifulSoup
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM