简体   繁体   中英

Python 3 - web scrape Microsoft CVE webpage error with JSONDecodeError

So, I read the question below. The answer provides some codes for further testing.

How can I scrape through the Microsoft CVE Webpage that assigns its content dynamically (preferably using Python)?

This is execution of my codes. Could someone advise on the error codes below?

Python 3.6.8 (default, Sep 26 2019, 11:57:09)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-39)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import requests
>>>
>>> cve_url = "https://portal.msrc.microsoft.com/en-US/security-guidance/advisory/CVE-2020-0910"
>>>
>>> response = requests.get(cve_url)
>>> cve_dict = response.json()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/site-packages/requests/models.py", line 898, in json
    return complexjson.loads(self.text, **kwargs)
  File "/usr/lib64/python3.6/json/__init__.py", line 354, in loads
    return _default_decoder.decode(s)
  File "/usr/lib64/python3.6/json/decoder.py", line 339, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib64/python3.6/json/decoder.py", line 357, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 2)

>>> print(response.text)

There is a response from the site. The response text is available here.

https://ybin.me/p/4302365fe913f62c#sdm8+KPnPhPQ8NfX9rrb2LuLgWUm5RgrnNSvd9Rtfd8=

Thank you.

Try looking through other requests if they contain any raw json responses.

I used wrong Microsoft URL. It should be as below. https://portal.msrc.microsoft.com/api/security-guidance/en-US/CVE/CVE-2020-0910

import requests

cve_url = "https://portal.msrc.microsoft.com/api/security-guidance/en-US/CVE/CVE-2020-0910"

response = requests.get(cve_url)
cve_dict = response.json()

#print(response.text)

requests.get(cve_url).json()
print(cve_dict)

And I get the json response.

{'publishedDate': '2020-04-14T07:00:00Z', 'cveNumber': 'CVE-2020-0910', 'affectedProducts': [{'name': 'Windows 10 Version 1809 for x64-based Systems', 'platform': '', 'impactId': 100000005, 'impact': 'Remote Code Execution', 'severityId': 100000000, 'severity': 'Critical', 'baseScore': 8.4, 'temporalScore': 7.6, 'environmentScore': None, 'vectorString': 'CVSS:3.0/AV:A/AC:L/PR:H/UI:N/S:C/C:H/I:H/A:H/E:P/RL:O/RC:C', 'supersedence': '4538461', 'knowledgeBaseId': None, 'knowledgeBaseUrl': None, 'monthlyKnowledgeBaseId': None, 'monthlyKnowledgeBaseUrl': None, 'downloadUrl': None, 'downloadTitle': None, 'monthlyDownloadUrl': None, 'monthlyDownloadTitle': None, 'articleTitle1': '4549949', 'articleUrl1': 'https://support.microsoft.com/help/4549949', 'downloadTitle1': 'Security Update', 'downloadUrl1': 'https://catalog.update.microsoft.com/v7/site/Search.aspx?q=KB4549949', 'doesRowOneHaveAtLeastOneArticleOrUrl': True, 'articleTitle2': '', 'articleUrl2': '', 'downloadTitle2': '', 'downloadUrl2': '', 'doesRowTwoHaveAtLeastOneArticleOrUrl': False, 'articleTitle3': '', 'articleUrl3': '', 'downloadTitle3': '', 'downloadUrl3': '', 'doesRowThreeHaveAtLeastOneArticleOrUrl': False, 'articleTitle4': '', 'articleUrl4': '', 'downloadTitle4': '', 'downloadUrl4': '', 'doesRowFourHaveAtLeastOneArticleOrUrl': False 
....

Even I was getting the same error using following URL

https://portal.msrc.microsoft.com/en-US/security-guidance/advisory/CVE-2018-8176

Later I tried following example to download the Json with CVE data. hope this helps:

import json
import requests
from bs4 import BeautifulSoup

url = (
    "https://api.msrc.microsoft.com/sug/v2.0/en-US/vulnerability/CVE-2018-8176"
)

data = requests.get(url).json()

# uncomment to print all data:
# print(json.dumps(data, indent=4))

print(data["cveTitle"])
print(BeautifulSoup(data["description"], "html.parser").get_text(strip=True))

output:

Microsoft PowerPoint Remote Code Execution Vulnerability
A remote code execution vulnerability exists in Microsoft PowerPoint software when the software fails to properly validate XML content. An attacker who successfully exploited the vulnerability could run arbitrary code in the context of the current user. If the current user is logged on with administrative user rights, an attacker could take control of the affected system. An attacker could then install programs; view, change, or delete data; or create new accounts with full user rights. Users whose accounts are configured to have fewer user rights on the system could be less impacted than users who operate with administrative user rights.Exploitation of the vulnerability requires that a user open a specially crafted file with an affected version of Microsoft Office PowerPoint software. In an email attack scenario, an attacker could exploit the vulnerability by sending the specially crafted file to the user and convincing the user to open the file. In a web-based attack scenario, an attacker could host a website (or leverage a compromised website that accepts or hosts user-provided content) that contains a specially crafted file designed to exploit the vulnerability. An attacker would have no way to force users to visit the website. Instead, an attacker would have to convince users to click a link, typically by way of an enticement in an email or instant message, and then convince them to open the specially crafted file. After the file is open, the user would need to move their mouse over a specific location on the page within the PowerPoint file to trigger the vulnerability.Note that the Preview Pane is not an attack vector for this vulnerability. The security update addresses the vulnerability by correcting how Microsoft PowerPoint handles objects in memory.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM