简体   繁体   中英

How to print selected text from JSON file using Python

I'm new to python and have undertaken my first project to automate something for my role (I'm in the network space, so forgive me if this is terrible!).

I'm required to to download a .json file from the below link:

https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519

My script goes through and retrieves the manual download link.

The reason I'm getting the URL in this way, is that the download link changes every fortnight when MS update the file.

My preference is to extract the "addressPrefixes" contents from the names of "AzureCloud.australiacentral", "AzureCloud.australiacentral2", "AzureCloud.australiaeast", "AzureCloud.australiasoutheast".

I'm then wanting to strip out characters of " & ','.

Each of the subnet ranges should then reside on a new line and be placed in a text file.

If I perform the below, I'm able to get the output that I am wanting.

Am I correct in thinking that I can use a for loop to achieve this? If so, would it be better to use a Python dictionary as opposed to using JSON formatted output?

# Script to check Azure IPs   
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# Import Modules for script

import requests
import re
import json
import urllib.request

search = 'https://download.*?\.json'
ms_dl_centre = "https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519"
requests_get = requests.get(ms_dl_centre)
json_url_search = re.search(search, requests_get.text)
  
json_file = json_url_search.group(0)

with urllib.request.urlopen(json_file) as url:
        contents = json.loads(url.read().decode())
 
print(json.dumps(contents['values'][1]['properties']['addressPrefixes'], indent = 0)) #use this to print contents from json entry 1

I'm not convinced that using re to parse HTML is a good idea. BeautifulSoup is more suited to the task. Upon inspection of the HTML response I note that there's a span element of class file-link-view1 that seems to uniquely identify the URL to the JSON download. Assuming that to be a robust approach (ie Microsoft don't change the way the download URL is presented) then this is how I'd do it:-

import requests
from bs4 import BeautifulSoup
namelist = ["AzureCloud.australiacentral", "AzureCloud.australiacentral2",
            "AzureCloud.australiaeast", "AzureCloud.australiasoutheast"]
baseurl = 'https://www.microsoft.com/en-us/download/confirmation.aspx?id=56519'
with requests.Session() as session:
    response = session.get(baseurl)
    response.raise_for_status()
    soup = BeautifulSoup(response.text, 'html.parser')
    downloadurl = soup.find('span', class_='file-link-view1').find('a')['href']
    response = session.get(downloadurl)
    response.raise_for_status()
    json = response.json()
    for n in json['values']:
        if n['name'] in namelist:
            print(n['name'])
            for ap in n['properties']['addressPrefixes']:
                print(ap)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM