简体   繁体   English

带Python和BS4的县级地址爬网

[英]Scraping addresses on a county Scale w/ Python & BS4

Okay, so I am trying to create a 'fixture finder' website for rugby teams and players across the UK, I am currently attempting to implement a web scraper using python and Beautifulsoup in order to scrape google for the addresses, to then be geocoded and inserted into the database as a long & lat for the maps API to map it out for the user. 好的,因此,我正在尝试为英国的橄榄球队和球员创建一个“夹具查找器”网站,我目前正在尝试使用python和Beautifulsoup实施网络抓取工具,以抓取Google的地址,然后进行地址解析和作为地图API的long&lat插入数据库中,以便为用户映射出来。

My question is, is there a way in which I could simply use the google geocoder API to retrieve the long & lat of ALL the clubs in that specific county and then parse through BeautifulSoup to retrieve the long and lat off of the generated page (and then just rinse and repeat for all the counties in the UK) following the example on: https://pypi.python.org/pypi/geocoder/1.8.0#downloads but on a county scale 我的问题是,有没有一种方法可以让我简单地使用google geocoder API来检索该特定县中所有俱乐部的经度和纬度,然后解析BeautifulSoup以从生成的页面中检索经度和纬度(并且然后只需按照以下示例进行冲洗,然后对英国的所有县重复一次): https//pypi.python.org/pypi/geocoder/1.8.0#downloads但在县级范围内

OR if someone could shed some light into potentially scraping the listings off of google maps, as I'm getting a general feeling that maps isn't allowed to be scraped. 或者,如果有人可以发现可能会从Google地图上删除列表的信息,因为我普遍认为不允许删除地图。

Any insight would be greatly appreciated 任何见解将不胜感激

Using Python Client for Google Maps Services and this code I got names and locations (and more) for query 'Rugby Club, London' 使用适用于Google Maps Services的Python客户端和此代码,我获得了查询'Rugby Club, London'名称和位置(以及更多)

You will have to create own project on developers.google.com/console , activate Places API for Web Service (there is no version for Desktop Application ) and get credencial for PlacesAPI - it gives you own key= (API key). 您将必须在developers.google.com/console上创建自己的项目,激活Places API for Web Service (没有适用于Desktop Application版本),并获得PlacesAPI的PlacesAPI -它为您提供了自己的key= (API密钥)。

Current key= is active so you can test code, but I will deactivate it later. 当前key=处于活动状态,因此您可以测试代码,但是稍后我将停用它。

import googlemaps

gmaps = googlemaps.Client(key='AIzaSyBiC8vKEEF-MLP9a2de0PLs-S_XrEL0kSQ')

results = gmaps.places('Rugby Club, London')

for key in item.keys():
    print('key:', key)

print('-----')

for item in results['results']:
    print('name:', item['name'])
    print('lat:', item['geometry']['location']['lat'])
    print('lng:', item['geometry']['location']['lng'])
    print('location:', item['geometry']['location'])
    print('---')

print('-----')

#for item in results['results'][:1]:
#    for key, value in item.items():
#        print(key, ':', value)

Result: (available keys and some names and locations) 结果:(可用键以及一些名称和位置)

key: formatted_address
key: geometry
key: icon
key: id
key: name
key: opening_hours
key: photos
key: place_id
key: rating
key: reference
key: types
-----
name: East London Rugby Football Club
lat: 51.5291765
lng: 0.0102242
location: {'lat': 51.5291765, 'lng': 0.0102242}
---
name: Hampstead Rugby Football Club
lat: 51.5571358
lng: -0.1555037
location: {'lat': 51.5571358, 'lng': -0.1555037}
---
name: Chiswick Rugby Club
lat: 51.47323
lng: -0.256633
location: {'lat': 51.47323, 'lng': -0.256633}
---
name: Wimbledon Rugby Football Club
lat: 51.41975009999999
lng: -0.2464434
location: {'lat': 51.41975009999999, 'lng': -0.2464434}
---
name: Saracens Amateur RFC
lat: 51.64230209999999
lng: -0.1429848
location: {'lat': 51.64230209999999, 'lng': -0.1429848}
---
name: Kilburn Cosmos RFC
lat: 51.55542000000001
lng: -0.2297043000000001
location: {'lat': 51.55542000000001, 'lng': -0.2297043000000001}
---
name: Barnes Rugby Football Club
lat: 51.47568860000001
lng: -0.2373847
location: {'lat': 51.47568860000001, 'lng': -0.2373847}
---
name: Southwark Tigers Rugby Club
lat: 51.4839377
lng: -0.07720149999999999
location: {'lat': 51.4839377, 'lng': -0.07720149999999999}
---
name: HACKNEY RFC
lat: 51.5732467
lng: -0.0611062
location: {'lat': 51.5732467, 'lng': -0.0611062}
---
name: UCS Old Boys Rugby Club
lat: 51.5575127
lng: -0.2022654
location: {'lat': 51.5575127, 'lng': -0.2022654}
---
name: Millwall Rugby Club
lat: 51.487884
lng: -0.010493
location: {'lat': 51.487884, 'lng': -0.010493}
---
name: Haringey Rhinos RFC
lat: 51.604738
lng: -0.099553
location: {'lat': 51.604738, 'lng': -0.099553}
---
name: Finchley RFC
lat: 51.6067705
lng: -0.1698911
location: {'lat': 51.6067705, 'lng': -0.1698911}
---
name: Trailfinders Rugby Club
lat: 51.520878
lng: -0.306115
location: {'lat': 51.520878, 'lng': -0.306115}
---
name: Old Ruts Rugby Club
lat: 51.4079431
lng: -0.1993505
location: {'lat': 51.4079431, 'lng': -0.1993505}
---
name: Ealing Trailfinders Rugby Club
lat: 51.524832
lng: -0.3293849999999999
location: {'lat': 51.524832, 'lng': -0.3293849999999999}
---
name: Chingford Rugby Football Club
lat: 51.6301123
lng: -0.0171661
location: {'lat': 51.6301123, 'lng': -0.0171661}
---
name: Old Elthamians RFC Senior Rugby
lat: 51.43445149999999
lng: 0.0296538
location: {'lat': 51.43445149999999, 'lng': 0.0296538}
---
name: Eton Manor RFC
lat: 51.579528
lng: 0.03874
location: {'lat': 51.579528, 'lng': 0.03874}
---
name: London Skolars Rugby League Club
lat: 51.60465900000001
lng: -0.100032
location: {'lat': 51.60465900000001, 'lng': -0.100032}
---

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM