简体   繁体   English

有没有办法从网站上的shadow-root中提取信息?

[英]Is there a way to extract information from shadow-root on a Website?

I am setting up code to check the reputation of any URL Eg http://go.mobisla.com/ on Website " https://www.virustotal.com/gui/home/url " 我正在设置代码以检查任何URL的声誉,例如http://go.mobisla.com/网站“ https://www.virustotal.com/gui/home/url

First, the very basic thing I am doing is to extract all the Website contents using BeautifulSoup but seems the information I am looking for is in shadow-root(open) -- div.detections and span.individual-detection. 首先,我正在做的最基本的事情是使用BeautifulSoup提取所有网站内容,但似乎我要查找的信息是阴影根(开放) - div.detections和span.individual-detection。

Example Copied Element from Webpage results: 网页结果中的复制元素示例:

No engines detected this URL 没有引擎检测到此URL

I am new to Python, wondering if you can share the best way to extract the information 我是Python的新手,想知道你是否可以分享提取信息的最佳方法

Tried requests.get() function but it doesn't give the required information 尝试了requests.get()函数,但它没有提供所需的信息

 import requests import os,sys from bs4 import BeautifulSoup import pandas as pd url_check = "deloplen.com:443" url = "https://www.virustotal.com/gui/home/url" req = requests.get(url + url_str) html = req.text soup = BeautifulSoup(html, 'html.parser') print(soup.prettify()) 

Expect to see "2 engines detected this URL" along with Detection Example: Dr. Web Malicious 期待看到“2个引擎检测到此URL”以及检测示例:Web Malicious博士

If you use their website, it'll only return a loading screen for VirusTotal, as this isn't the proper way. 如果您使用他们的网站,它将只返回VirusTotal的加载屏幕,因为这不是正确的方法。

What Shows Up: 什么出现:

Instead, what you're supposed to do is use their public API to make requests. 相反,你应该做的是使用他们的公共API来发出请求。 However, you'll have to make an account to obtain a Public API Key. 但是,您必须创建帐户才能获取公共API密钥。

You can use this code which is able to retrieve JSON info about the link. 您可以使用此代码来检索有关链接的JSON信息。 However, you'll have to fill in the API KEY with yours. 但是,您必须用自己的API填写API KEY。

import requests, json

user_api_key = "<api key>"
resource = "deloplen.com:443"

# feel free to remove this, just makes it look nicer
def pp_json(json_thing, sort=True, indents=4):
    if type(json_thing) is str:
        print(json.dumps(json.loads(json_thing), sort_keys=sort, indent=indents))
    else:
        print(json.dumps(json_thing, sort_keys=sort, indent=indents))
        return None

response = requests.get("https://www.virustotal.com/vtapi/v2/url/report?apikey=" + user_api_key + "&resource=" + resource)

json_response = response.json()

pretty_json = pp_json(json_response)

print(pretty_json)

If you want to learn more about the API, you can use their documentation . 如果您想了解有关API的更多信息,可以使用他们的文档

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 网页抓取#shadow-root - Webscraping #shadow-root 如何使用 shadow-root 访问网站中的产品元素? - How do I acces the products element in a website with shadow-root? 如何使用 Selenium Python 在#shadow-root(打开)中提取信息? - How to extract info within a #shadow-root (open) using Selenium Python? 如何定位在 shadow-root 中找不到的元素 - How to locate element not found in shadow-root 通过 Python 和 Selenium 在#shadow-root(打开)中单击按钮的最佳方式 - Best way to click a button within #shadow-root (open) via Python and Selenium 无法使用 Python Selenium 从 shadow-root 内的元素中提取文本 - Unable to pull text from elements within shadow-root using Python Selenium How to locate the First name field within shadow-root (open) within the website https://www.virustotal.com using Selenium and Python - How to locate the First name field within shadow-root (open) within the website https://www.virustotal.com using Selenium and Python 如何阅读#shadow-root(用户代理)下的文本 - How to read text that is under #shadow-root (user-agent) 无法使用 Selenium Python 将日期输入到带有 shadow-root(用户代理)的字段 - Cannot input date to field with shadow-root (user-agent) with Selenium Python Python Selenium can't find element by xpath within #shadow-root (open) using Selenium and Python - Python Selenium can't find element by xpath within #shadow-root (open) using Selenium and Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM