简体   繁体   English

如何从通过 selenium 和 python 提交数据后刷新的网页中抓取数据?

[英]How do I scrape data from a web page that refreshes after submitting data via selenium and python?

I'm developing a geolocation web-scraper with python and selenium.我正在用 python 和 selenium 开发一个地理定位 web-scraper。 When I enter data in this website , the page refreshes (with the same URL) and when I try to get the data from the latitude and longitude input it prints nothing.当我在此网站中输入数据时,页面会刷新(使用相同的 URL),当我尝试从纬度和经度输入中获取数据时,它什么也不打印。

Here's the sample output, it returns an empty string这是示例输出,它返回一个空字符串

I did notice that the value tag changes after entering data in我确实注意到在输入数据后value标签发生了变化

<input id="place" name="place" type="text" placeholder="Type a place name" class="width70" style="text-transform:capitalize;" value="" required="">

Should I manipulate that?我应该操纵它吗? Thank you:)谢谢:)

Here's my code:这是我的代码:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import time

counter = 0

locations = [

    'Republic of the Philippines',
    'Heaven',
    'Philippines',
]

latitude = []
longtitude = []

browser = webdriver.Chrome('C://Users/user1/Portable Python 3.7.0     x64/App/Python/Lib/site-packages/chromedriver')

url = 'https://www.latlong.net/'

for i in locations:

    browser.get(url)
    bar = browser.find_element_by_id('place')
    bar.send_keys(i)
    bar.send_keys(Keys.ENTER)
    time.sleep(3)
    lat = browser.find_element_by_id('lat')
    lng = browser.find_element_by_id('lng')

    time.sleep(3)

    latitude.append(lat.text)
    longtitude.append(lng.text)

    print(latitude[counter])
    print(longtitude[counter])

    counter+=1

    browser.refresh()

You can do a POST request你可以做一个 POST 请求

import requests
from bs4 import BeautifulSoup as bs
import re

url = 'https://www.latlong.net/'
locations = ['Republic of the Philippines', 'Heaven', 'Philippines']
latitude = []
longitude = []

with requests.Session() as sess:

    for i in locations: 
        r = sess.get(url)
        soup = bs(r.content, 'lxml')
        token = soup.select_one('#lltoken')['value']
        data = { 'place': i, 'lltoken': token }
        r = sess.post(url, data = data)
        s = r.text

        try:
            lat_lon = re.findall( r'sm\((-?\d+\.\d+),(-?\d+\.\d+)', s)[0]
            lat = lat_lon[0]
            lon = lat_lon[1]
            latitude.append(lat)
            longitude.append(lon)
        except:
            print(s)

print(latitude)
print(longitude)

Selenium:硒:

You can grab them from the src of the map iframe.您可以从地图 iframe 的 src 中获取它们。 There doesn't appear to be a need for wait conditions but you may need to consider adding those (or I will happily add to show you)似乎不需要等待条件,但您可能需要考虑添加这些条件(或者我会很乐意添加以向您展示)

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import re

locations = [  
    'Republic of the Philippines',
    'Heaven',
    'Philippines',
]

latitude = []
longitude = []

url = 'https://www.latlong.net/'

browser = webdriver.Chrome()
browser.get(url)

for i in locations:
    bar = browser.find_element_by_id('place')
    bar.clear()
    bar.send_keys(i)
    bar.send_keys(Keys.ENTER)
    s = browser.find_element_by_id('latlongmape').get_attribute('src')
    lat_lon = re.findall( r'(-?\d+\.\d+)', s)
    lat = lat_lon[0]
    lon = lat_lon[1]
    latitude.append(lat)
    longitude.append(lon)

print(latitude)
print(longitude)
browser.quit()

Wait conditions using a different element to source:使用不同的元素来等待条件:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import re
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

locations = [

    'Republic of the Philippines',
    'Heaven',
    'Philippines',
]

latitude = []
longitude = []

url = 'https://www.latlong.net/'

browser = webdriver.Chrome()
browser.get(url)

for i in locations:
    bar = WebDriverWait(browser,5).until(EC.presence_of_element_located((By.ID, "place")))
    bar.clear()
    bar.send_keys(i)
    bar.send_keys(Keys.ENTER)
    s = WebDriverWait(browser,5).until(EC.presence_of_element_located((By.ID, "coordinateslink"))).text
    lat_lon = re.findall( r'(-?\d+\.\d+)', s)
    lat = lat_lon[0]
    lon = lat_lon[1]
    latitude.append(lat)
    longitude.append(lon)

print(latitude)
print(longitude)
browser.quit()

You could also use javascript to return the values:您还可以使用 javascript 返回值:

lat = browser.execute_script("return document.getElementById('lat').value;")
lon = browser.execute_script("return document.getElementById('lng').value;")

You can also regex from where in one of the script tags:您还可以从其中一个脚本标签中的 where 进行正则表达式:

lat_lon = re.findall( r'sm\((-?\d+\.\d+),(-?\d+\.\d+)', browser.page_source)[0]
lat = lat_lon[0]
lon = lat_lon[1]
print(lat, lon)

Places where values found:发现价值的地方:

You can see all the different places where javascript is assigning the co-ordinate values in the script that has the following js:您可以看到 javascript 在具有以下 js 的脚本中分配坐标值的所有不同位置:

 <script> var mymap = L.map('latlongmap'); var mmr = L.marker([0,0]); mmr.bindPopup('0,0'); mmr.addTo(mymap); L.tileLayer('https://{s}.tile.openstreetmap.org/{z}/{x}/{y}.png?{foo}', {foo: 'bar', attribution:'&copy; <a href="https://www.openstreetmap.org/copyright">OpenStreetMap</a>'}).addTo(mymap); mymap.on('click', onMapClick); sm(14.693390,121.067238,12) function isll(num) { var val = parseFloat(num); if (;isNaN(val) && val <= 90 && val >= -90) return true; else return false. } function onMapClick(e) { mmr.setLatLng(e;latlng). setui(e.latlng,lat.e.latlng,lng.mymap;getZoom()), } function dec2dms(et) { document.getElementById("dms-lat"),innerHTML = getdms(e, .0). document,getElementById("dms-lng"),innerHTML = getdms(t, ,1) } function getdms(e, t) { var n = 0; m = 0? l = 0: a = "X"? return a = t && 0 > e: "S"? :t && 0 > e, "W". t, "N". "E", d = Math,abs(e). n = Math,floor(d). l = 3600 * (d - n), m = Math;floor(l / 60), l = Math,round(1e4 * (l - 60 * m)) / 1e4, n + "&deg, " + m + "' " + l + "'' " + a } function sm(lt;ln.zm) { setui(lt.ln,zm); mmr.setLatLng(L,latLng(lt,ln)); mymap,setView([lt,ln]. zm); } function setui(lt.ln;zm) { lt = Number(lt).toFixed(6), ln = Number(ln).toFixed(6); mmr.setPopupContent(lt + '.' + ln);openPopup(). document.getElementById("lat");value=lt. document.getElementById("lng"),value=ln; document.getElementById("latlngspan").innerHTML ="(" + lt + ": " + ln + ")". document?getElementById("coordinatesurl");value = "https.//www.latlong.net/c/;lat=" + lt + "&long=" + ln: document.getElementById("coordinateslink")?innerHTML = '&lt;a href="https;//www,latlong.net/c/;lat=' + lt + "&amp;long=" + ln + '" target="_blank"&gt;(' + lt + ", " + ln + ")&lt;/a&gt.". dec2dms(lt:ln). document.getElementById('latlongmape')?src='https,//www;google.com/maps/embed/v1/view?key=AIzaSyALrSTy6NpqdhIOUs3IQMfvjh71td2suzY&maptype=satellite&'+'center='+lt+','+ ln+'&zoom='+zm; } </script>

The problem is that if you check the element after sending the Keys.ENTER, there's no text to be read.问题是,如果您在发送 Keys.ENTER 后检查该元素,则没有可读取的文本。 It somehow uses a different technology to replace the "placeholder"它以某种方式使用不同的技术来替换“占位符”

<div class="col-6 m2">
   <label for="lat">Latitude</label>
   <input type="text" name="lat" id="lat" placeholder="lat coordinate">
</div>

What you could do on the other hand is, find element id "latlngspan" .另一方面,您可以做的是找到元素 id "latlngspan" That's below the map and there are both parameters - lat and long and you could perform a few simple string operations on it to get the format you need.它位于地图下方,并且有两个参数 - lat 和 long,您可以对其执行一些简单的字符串操作以获得所需的格式。

Would that work for you?那对你有用吗?

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python 网页抓取 | 如何通过选择页码作为使用 Beautiful Soup 和 selenium 的范围从多个 url 中抓取数据? - Python Web Scraping | How to scrape data from multiple urls by choosing page number as a range with Beautiful Soup and selenium? 使用Selenium重定向到新页面后,如何刮擦新刷新的数据 - How do I Scrape new refreshed data after redirecting to the new page by using selenium 如何使用 Selenium 和 Python 从 Linkedin 页面抓取嵌套数据 - How to scrape the nested data from Linkedin page using Selenium and Python 如何使用Selenium和Python抓取嵌套数据 - How do I scrape nested data using selenium and Python> 如何使用 Selenium 从 Trip Advisor 中抓取数据? - Python - How do I scrape data from Trip Advisor by using Selenium? - Python 如何使用 Python 中的 Selenium 从在滚动上添加 div 的网页中抓取数据? - How do I scrape data using Selenium in Python from a webpage that adds div on scroll? 如何从 Selenium Python 中的按钮抓取数据 - How To Scrape Data From Button In Selenium Python 如何从该网页上的Google文档表中抓取数据? - How do I scrape the data from the Google Docs table on this web page? 如何使用python中的beautifulsoup从网页中获取数据 - How do I get scrape data from web pages using beautifulsoup in python Python 使用 Selenium 从页面上的多个链接中抓取数据 - Python Using Selenium to scrape data from multiple links on a page
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM