简体   繁体   English

如何使用两个下拉菜单在网络上抓取地图?

[英]how to web scrape a map using two drop down menus?

I am new to webscraping.我是网络抓取的新手。 I have been trying to extract the business information such as ShopName and the business Address.我一直在尝试提取商家信息,例如 ShopName 和商家地址。 There are two drop down menus.有两个下拉菜单。 One correspond to the province and the other one correspond to the district.一个对应于省,另一个对应于区。 I went to the network tab inside the developer tools.我转到了开发人员工具中的网络选项卡。 Once I selected the respective province and district,I get a JSON format response for which I tested a code to extract the shopName and Info.Here's the code for the first province and first district:一旦我选择了相应的省和区,我就会得到一个 JSON 格式的响应,我测试了一个代码来提取 shopName 和 Info。这是第一个省和第一区的代码:

import requests


headers = {
    'Connection': 'keep-alive',
    'Accept': '*/*',
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.111 Safari/537.36',
    'X-Requested-With': 'XMLHttpRequest',
    'Sec-Fetch-Site': 'same-origin',
    'Sec-Fetch-Mode': 'cors',
    'Sec-Fetch-Dest': 'empty',
    'Referer': 'https://xn--b3cuh3bhdeppad0as7a5dybu5qd4a3kl8e.moc.go.th/MarkerShop',
    'Accept-Language': 'en-US,en;q=0.9,hi;q=0.8,bg;q=0.7,sv;q=0.6',
}

params = (
    ('prov', '81'),
    ('amphur', '8101'),
)

response = requests.get('https://xn--b3cuh3bhdeppad0as7a5dybu5qd4a3kl8e.moc.go.th/api/shopapi/getshopicon', headers=headers, params=params)


master_list=[]
for detail in details:
    data_dict={}
    invalid_tags = ['\\r', '\\n', '<', '>', '-', ';','<br>','<b>','</b>','/b',"img src='icon/icon_y.png'"," ","brb","imgsrc='icon/icon_g.png'"]
    for invalid_tag in invalid_tags:
        detail['Address']  = detail['Address'].replace(invalid_tag, '')
        detail['ShopName'] = detail['ShopName'].replace(invalid_tag,'')
    data_dict["Address_new"]=detail['Address']
    data_dict['Shop']=detail['ShopName']
    master_list.append(data_dict)

The above code gives me the required output for the first province and the first district.上面的代码给了我第一省和第一区所需的输出。 I want to loop over the rest of the province and their respective districts.我想遍历该省的其他地区及其各自的地区。 I have been trying to find out answers.我一直在努力寻找答案。 What I see is that people have using selenium to go through the drop down lists and maybe AJAX to request calls.我看到的是人们使用 selenium 浏览下拉列表,也许使用 AJAX 来请求调用。 I am not familiar with AJAX but just have some preliminary information.我不熟悉 AJAX,但只有一些初步信息。 Please suggest of how to get the required information.请建议如何获取所需信息。

Here's the link(thai language).这是链接(泰语)。 It has all the business information displayed on a map.它具有显示在地图上的所有业务信息。

https://xn--b3cuh3bhdeppad0as7a5dybu5qd4a3kl8e.moc.go.th/MarkerShop#gomap https://xn--b3cuh3bhdeppad0as7a5dybu5qd4a3kl8e.moc.go.th/MarkerShop#gomap

try this:尝试这个:

import requests as r
import json
from bs4 import BeautifulSoup

my_url = 'https://xn--b3cuh3bhdeppad0as7a5dybu5qd4a3kl8e.moc.go.th/MarkerShop'

res = r.get(my_url)
soup = BeautifulSoup(res.text, "html.parser")
provinces = soup.find_all('option')[1:]
provinces.pop()
invalid_tags = ['\\r', '\\n', '<', '>', '-', ';','<br>','<b>','</b>','/b',"img src='icon/icon_y.png'"," ","brb","imgsrc='icon/icon_g.png'"]
master_list=[]
for province in provinces:
    res = r.get('https://xn--b3cuh3bhdeppad0as7a5dybu5qd4a3kl8e.moc.go.th/api/shopapi/getshopicon?prov={}&amphur=0'.format(province["value"]))
    data = res.json()
    for detail in data:
        data_dict={}
        for invalid_tag in invalid_tags:
            detail['Address']  = detail['Address'].replace(invalid_tag, '')
            detail['ShopName'] = detail['ShopName'].replace(invalid_tag,'')
        data_dict["Address_new"]=detail['Address']
        data_dict['Shop']=detail['ShopName']
        master_list.append(data_dict)
print(master_list)

This gets all the data for each province with the data of all it's districts, you'll get it as JSON you could extract all your needs from there.这将获取每个省的所有数据及其所有地区的数据,您将获得它作为JSON您可以从那里提取您的所有需求。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM