有没有办法从网站上的shadow-root中提取信息？

Question

I am setting up code to check the reputation of any URL Eg http://go.mobisla.com/ on Website " https://www.virustotal.com/gui/home/url " 我正在设置代码以检查任何URL的声誉，例如http://go.mobisla.com/网站“ https://www.virustotal.com/gui/home/url ”

First, the very basic thing I am doing is to extract all the Website contents using BeautifulSoup but seems the information I am looking for is in shadow-root(open) -- div.detections and span.individual-detection. 首先，我正在做的最基本的事情是使用BeautifulSoup提取所有网站内容，但似乎我要查找的信息是阴影根（开放） - div.detections和span.individual-detection。

Example Copied Element from Webpage results: 网页结果中的复制元素示例：

No engines detected this URL 没有引擎检测到此URL

I am new to Python, wondering if you can share the best way to extract the information 我是Python的新手，想知道你是否可以分享提取信息的最佳方法

Tried requests.get() function but it doesn't give the required information 尝试了requests.get（）函数，但它没有提供所需的信息

 import requests import os,sys from bs4 import BeautifulSoup import pandas as pd url_check = "deloplen.com:443" url = "https://www.virustotal.com/gui/home/url" req = requests.get(url + url_str) html = req.text soup = BeautifulSoup(html, 'html.parser') print(soup.prettify())

Expect to see "2 engines detected this URL" along with Detection Example: Dr. Web Malicious 期待看到“2个引擎检测到此URL”以及检测示例：Web Malicious博士

Answer 1

If you use their website, it'll only return a loading screen for VirusTotal, as this isn't the proper way. 如果您使用他们的网站，它将只返回VirusTotal的加载屏幕，因为这不是正确的方法。

What Shows Up: 什么出现：

Instead, what you're supposed to do is use their public API to make requests. 相反，你应该做的是使用他们的公共API来发出请求。 However, you'll have to make an account to obtain a Public API Key. 但是，您必须创建帐户才能获取公共API密钥。

You can use this code which is able to retrieve JSON info about the link. 您可以使用此代码来检索有关链接的JSON信息。 However, you'll have to fill in the API KEY with yours. 但是，您必须用自己的API填写API KEY。

import requests, json

user_api_key = "<api key>"
resource = "deloplen.com:443"

# feel free to remove this, just makes it look nicer
def pp_json(json_thing, sort=True, indents=4):
    if type(json_thing) is str:
        print(json.dumps(json.loads(json_thing), sort_keys=sort, indent=indents))
    else:
        print(json.dumps(json_thing, sort_keys=sort, indent=indents))
        return None

response = requests.get("https://www.virustotal.com/vtapi/v2/url/report?apikey=" + user_api_key + "&resource=" + resource)

json_response = response.json()

pretty_json = pp_json(json_response)

print(pretty_json)

If you want to learn more about the API, you can use their documentation . 如果您想了解有关API的更多信息，可以使用他们的文档。

有没有办法从网站上的shadow-root中提取信息？

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-07-31 23:21:40

有没有办法从网站上的shadow-root中提取信息？

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-07-31 23:21:40

解决方案1
0 已采纳 2019-07-31 23:21:40