简体   繁体   English

如何从使用 javascript 生成的工具提示中抓取文本

[英]How to scrape text from tooltips generated with javascript

I've written the following code to get the positions of all blue markers in the map.我编写了以下代码来获取地图中所有蓝色标记的位置。

from bs4 import BeautifulSoup
from requests_html import HTMLSession
session = HTMLSession()

url="https://emf2.bundesnetzagentur.de/karte/Default.aspx?lat=52.4107723&lon=14.2930953&zoom=14"
r = session.get(url)
r.html.render(sleep = 3)
data = r.html.html

soup=BeautifulSoup(data,'html.parser')
BlueTriangles = soup.find_all(src="images/funk_hf.png")
for Triangle in BlueTriangles[1:]:
    TriangleStyle = Triangle['style']
    PixelPosition = TriangleStyle.split('transform: translate3d(')[1].split(', 0px); z')[0]
    print(PixelPosition)

r.session.close()

When I open the URL using a web browser, I see that each blue marker has a unique ID that is shown in a tooltip on mouseover:当我使用 Web 浏览器打开 URL 时,我看到每个蓝色标记都有一个唯一的 ID,显示在鼠标悬停时的工具提示中:

在此处输入图片说明

The html code of the tooltip appears to be rendered triggered by a mouseover event:工具提示的 html 代码似乎是由鼠标悬停事件触发呈现的:

在此处输入图片说明

Is there any way of scraping the the ID from the tooltip?有没有办法从工具提示中抓取 ID? I was wondering whether it is possible use the script parameter of render to force a mouseover event.我想知道是否可以使用 render 的脚本参数来强制鼠标悬停事件。 But I couldn't find a way to integrate it in the code:但是我找不到将它集成到代码中的方法:

$('#foo').trigger('mouseover');

Points on the map are rendered by request to the endpoint https://emf2.bundesnetzagentur.de/karte/Standortservice.asmx/GetStandorteFreigabe with box coordinates (in this case {"Box":{"sued":52.39231101879802,"west":14.248666763305664,"nord":52.42927461241364,"ost":14.337587356567385}} ).地图上的点通过对端点https://emf2.bundesnetzagentur.de/karte/Standortservice.asmx/GetStandorteFreigabe的请求呈现,并带有框坐标(在本例中为{"Box":{"sued":52.39231101879802,"west":14.248666763305664,"nord":52.42927461241364,"ost":14.337587356567385}} )。

Response is json.响应是 json。 Locations' data is encrypted by AES.位置数据由 AES 加密。 Decryption code is available in js script loading with page (functions CryptParams and DecryptData ).解密代码在 js 脚本加载页面中可用(函数CryptParamsDecryptData )。

After decryption we get this nice data: "[{"Titel":"018126","Lng":14.311666,"Lat":52.428888,"fID":1076,"sonderseite":false},{"Titel":"011720","Lng":14.259722,"Lat":52.423054,"fID":2196,"sonderseite":false},{"Titel":"87011082","Lng":14.275832,"Lat":52.401666,"fID":560919,"sonderseite":false}]"解密后我们得到了这个不错的数据: "[{"Titel":"018126","Lng":14.311666,"Lat":52.428888,"fID":1076,"sonderseite":false},{"Titel":"011720","Lng":14.259722,"Lat":52.423054,"fID":2196,"sonderseite":false},{"Titel":"87011082","Lng":14.275832,"Lat":52.401666,"fID":560919,"sonderseite":false}]"

You have two ways.你有两种方法。

  1. Use selenium or similar software to render JS and try to parse resulting DOM;使用 selenium 或类似软件渲染 JS 并尝试解析生成的 DOM;

  2. Write parser to send request to GetStandorteFreigabe endpoint and decode it's response (convert code from js to python),编写解析器将请求发送到 GetStandorteFreigabe 端点并解码它的响应(将代码从 js 转换为 python),

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM