簡體   English   中英

如何使用BeautifulSoup從HTML頁面抓取緯度/經度數據

[英]How to use beautifulsoup to scrape the Latitude/Longitude data from html page

我正在嘗試從此網站抓取緯度和經度編號:

http://www.healthgrades.com/provider-search-directory/search?q=Dentistry&prof.type=provider&search.type=&method=&loc=New+York+City%2C+NY+&pt=40.71455%2C-74.007118&isNeighborhood=&locType=%7Cstate%7Ccity&locIsSolrCity=false

對於每個提供者,如果您查看該元素,它看起來就像

div class="listing" data-lat="40.66862" data-lng="-73.98574" data-listing="22"

如何使用beautifulsoup在這里獲取緯度和經度數?

我試圖在腳本中使用正則表達式,

以下是我的腳本-

Geo = soup.find("div", class_="providerSearchResults")
print Geo.findAll("div", data-lat_= re.compile('[0-9.]'))

但我收到此錯誤消息:“ SyntaxError:關鍵字不能是表達式”

此外,對於每個提供程序,“ div”部分始終會更改為:

div class="listing" data-lat="40.66862" data-lng="-73.98574" data-listing="22"

要么

div class="listingfirst" data-lat="40.66862" data-lng="-73.98574" data-listing="22"

甚至

div class="listing enhancedlisting" data-lat="40.66862" data-lng="-73.98574" data-listing="22"

首先有幾個要求:

pip install requests
pip install BeautifulSoup
pip install lxml

latlongbs4.py:

import requests
from bs4 import BeautifulSoup

r = requests.get('http://www.healthgrades.com/provider-search-directory/search?q=Dentistry&prof.type=provider&search.type=&method=&loc=New+York+City%2C+NY+&pt=40.71455%2C-74.007118&isNeighborhood=&locType=%7Cstate%7Ccity&locIsSolrCity=false')
soup = BeautifulSoup(r.text, 'lxml')
latlonglist = soup.find_all(attrs={"data-lat": True, "data-lng": True})
for latlong in latlonglist:
    print latlong['data-lat'], latlong['data-lng']

編輯: 從attrs詞典中刪除了class

輸出:

(latlongbs4)macbook:latlongbs4 joeyoung$ python latlongbs4.py
40.71851 -74.00984
40.77536 -73.97707
40.71961 -74.00347
40.71395 -74.008
40.711614 -74.015901
40.724576 -74.001771
40.7175 -74.00087
40.71961 -74.00347
40.71766 -73.99293
40.71961 -74.00347
40.71848 -73.99648
40.709917 -74.009884
40.71553 -74.00977
40.71702 -73.996
40.71254 -73.99994
40.70869 -74.01164
40.70994 -74.00764
40.707325 -74.003982
40.7184 -74.00098
40.71373 -74.00812
40.710474 -74.009844
40.7175 -74.00087
40.727582 -73.894632
40.763469 -73.963106
40.724853 -73.841097

一些注意事項:

我將attrs關鍵字與字典一起使用是因為:

某些屬性(例如HTML 5中的data- *屬性)具有不能用作關鍵字參數名稱的名稱:

您可以在搜索中使用這些屬性,方法是將它們放入字典中,並將字典作為attrs參數傳遞到find_all()中:

資料來源: http : //www.crummy.com/software/BeautifulSoup/bs4/doc/#the-keyword-arguments

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM