简体   繁体   中英

Python 2.7 BeautifulSoup, website addresses scraping

Hope you are all well. I'm new in Python and using python 2.7.

I'm trying to extract only the websites from this public website business directory: https://www.dmcc.ae/business-directory
the websites i'm looking for are the websites mentioned in every widget. This directory does not have an API unfortunately.
I'm using BeautifulSoup, but with no success so far.
here is mycode:

import urllib
from bs4 import BeautifulSoup
website = raw_input("Type Website:>\n")
html = urllib.urlopen('https://'+ website).read()
soup = BeautifulSoup(html)
tags = soup('a')
for tag in tags:
    print tag.get('href', None)

what i get is just the website of the actual website , like http://portal.dmcc.ae along with other href rather then the websites in the widgets. i also tried replacing soup('a') with soup ('class'), but no luck! Can anybody help me please?

The data is dynamically generate using Jquery though an ajax request, you can do a get request to the url to get the dynamically loaded data:

from requests import Session
from time import time
data = {
        "page_num": "1", # set it to whatever page you like
        "query_type": "activities",
        "_": str(int(time()))}
js_url = "https://dmcc.secure.force.com/services/apexrest/DMCC_BusinessDirectory_API_1/get"
with Session() as s:
    soup = BeautifulSoup(s.get("https://www.dmcc.ae/business-directory").content, "html.parser")
    r = s.get(js_url, params=data)

    data = r.json()

Which will give you:

{u'success': True, u'requestURI': u'/DMCC_BusinessDirectory_API_1/get', u'params': [u'DMCC_BusinessDirectory_API_1', u'get', u' ', u' '], u'message': u'Getting all activities.', u'sObjects': [{u'Account__c': u'001b000000MV4LaAAL', u'Building__c': u'55.1450717', u'Property_Location__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Property_Location__c/a1G10000000XPVOEA4', u'type': u'Property_Location__c'}, u'Id': u'a1G10000000XPVOEA4', u'Function_Type_Class__c': u'Office', u'Name': u'PL-032972'}, u'Operating_Name__c': u'001b000000MV4LaAAL', u'Longitude__c': u'55.1450717', u'License_Address_for_Business_Directroy__c': u'Unit No: 3006-002<br>Mazaya Business Avenue BB1<br>Jumeirah Lakes Towers<br>Dubai<br>UAE', u'License_Status__c': u'Active', u'Property_Location__c': u'a1G10000000XPVOEA4', u'attributes': {u'url': u'/services/data/v37.0/sobjects/License__c/a03b0000006h16cAAA', u'type': u'License__c'}, u'Operating_Name__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/001b000000MV4LaAAL', u'type': u'Account'}, u'Id': u'001b000000MV4LaAAL', u'Name': u'1 ON 1 HR CONSULTING DMCC'}, u'License_Address__c': u'Unit No: 3006-002<br>Mazaya Business Avenue BB1<br>Plot No: JLTE-PH2-BB1<br>Jumeirah Lakes Towers<br>Dubai<br>UAE', u'Id': u'a03b0000006h16cAAA', u'Latitude__c': u'25.06828081', u'Account__r': {u'Company_Official_Email_Address__c': u'resume@1on1hrconsulting.com', u'Monday_To__c': u'18:00', u'Tuesday_To__c': u'18:00', u'Thursday_To__c': u'18:00', u'Id': u'001b000000MV4LaAAL', u'Operating_Time_from_regular__c': u'08:00', u'Name': u'1 ON 1 HR CONSULTING DMCC', u'Tuesday_From__c': u'08:00', u'LinkedIn_URL__c': u'https://www.linkedin.com/company/1on1-hr-consulting', u'Saturday_From__c': u'Closed', u'Phone_BD__c': u'+97144470173', u'Facebook_Link__c': u'https://www.facebook.com/duabiinterviewandresumecoaching', u'Wednesday_To__c': u'18:00', u'Operating_Time_to_regular__c': u'18:00', u'Friday_To__c': u'Closed', u'Friday_From__c': u'Closed', u'Monday_From__c': u'08:00', u'Saturday_To__c': u'Closed', u'Company_Website_Address__c': u'www.1on1hrconsulting.com', u'Wednesday_From__c': u'08:00', u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/001b000000MV4LaAAL', u'type': u'Account'}, u'Thursday_From__c': u'08:00', u'Publishing_agreement_for_BD__c': u'Publish all details in DMCC online/printed content'}}, {u'Account__c': u'001b000000MV2s2AAD', u'Building__c': u'Gold Tower (AU)', u'Property_Location__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Property_Location__c/a1G10000000XHiKEAW', u'type': u'Property_Location__c'}, u'Id': u'a1G10000000XHiKEAW', u'Function_Type_Class__c': u'Office', u'Name': u'PL-005466'}, u'Operating_Name__c': u'001b000000MV2s2AAD', u'Longitude__c': u'55.14324963', u'License_Address_for_Business_Directroy__c': u'Unit No: AU-23-J<br>Gold Tower (AU)<br>Jumeirah Lakes Towers<br>Dubai<br>United Arab Emirates', u'License_Status__c': u'Active', u'Property_Location__c': u'a1G10000000XHiKEAW', u'attributes': {u'url': u'/services/data/v37.0/sobjects/License__c/a03b0000006h1DaAAI', u'type': u'License__c'}, u'Operating_Name__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/001b000000MV2s2AAD', u'type': u'Account'}, u'Id': u'001b000000MV2s2AAD', u'Name': u'1 PLATINUM CONCIERGE MANAGEMENT DMCC'}, u'License_Address__c': u'Unit No: AU-23-J<br>Gold Tower (AU)<br>Plot No: JLT-PH1-I3A<br>Jumeirah Lakes Towers<br>Dubai<br>United Arab Emirates', u'Id': u'a03b0000006h1DaAAI', u'Latitude__c': u'25.06937097', u'Account__r': {u'Saturday_To__c': u'Closed', u'Tuesday_From__c': u'09:00', u'Friday_From__c': u'Closed', u'Phone_BD__c': u'+971505687478', u'Monday_From__c': u'09:00', u'Monday_To__c': u'21:00', u'Publishing_agreement_for_BD__c': u'Publish only name and address in DMCC online/printed content', u'Company_Official_Email_Address__c': u'melvin@1boxoffice.ae', u'Company_Website_Address__c': u'www.1platinumconcierge.com', u'Operating_Time_to_regular__c': u'21:00', u'Friday_To__c': u'Closed', u'Tuesday_To__c': u'21:00', u'Wednesday_To__c': u'21:00', u'Wednesday_From__c': u'09:00', u'Thursday_To__c': u'21:00', u'Saturday_From__c': u'Closed', u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/001b000000MV2s2AAD', u'type': u'Account'}, u'Thursday_From__c': u'09:00', u'Id': u'001b000000MV2s2AAD', u'Operating_Time_from_regular__c': u'09:00', u'Name': u'1 PLATINUM CONCIERGE MANAGEMENT DMCC'}}, {u'Account__c': u'0011000000jki9iAAA', u'Building__c': u'Tiffany Towers', u'Property_Location__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Property_Location__c/a1G10000000XL4ZEAW', u'type': u'Property_Location__c'}, u'Id': u'a1G10000000XL4ZEAW', u'Function_Type_Class__c': u'Office', u'Name': u'PL-015938'}, u'Operating_Name__c': u'0011000000jki9iAAA', u'Longitude__c': u'55.14960835', u'License_Address_for_Business_Directroy__c': u'Unit No: 1906<br>Tiffany Towers<br>Jumeirah Lakes Towers<br>Dubai<br>UAE', u'License_Status__c': u'Active', u'Property_Location__c': u'a1G10000000XL4ZEAW', u'attributes': {u'url': u'/services/data/v37.0/sobjects/License__c/a031000000LBSCcAAP', u'type': u'License__c'}, u'Operating_Name__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/0011000000jki9iAAA', u'type': u'Account'}, u'Id': u'0011000000jki9iAAA', u'Name': u'1000HEADS CONSULTING DMCC'}, u'License_Address__c': u'Unit No: 1906<br>Tiffany Towers<br>Plot No: JLT-PH2-W2A<br>Jumeirah Lakes Towers<br>Dubai<br>UAE', u'Id': u'a031000000LBSCcAAP', u'Latitude__c': u'25.07726334', u'Account__r': {u'Company_Official_Email_Address__c': u'dubai@1000heads.com', u'Monday_To__c': u'18:00', u'Tuesday_To__c': u'18:00', u'Thursday_To__c': u'18:00', u'Id': u'0011000000jki9iAAA', u'Operating_Time_from_regular__c': u'09:00', u'Name': u'1000HEADS CONSULTING DMCC', u'Tuesday_From__c': u'09:00', u'Twitter_Name__c': u'@1000heads', u'Saturday_From__c': u'Closed', u'Phone_BD__c': u'+97143641221', u'Friday_To__c': u'Closed', u'Wednesday_To__c': u'18:00', u'Operating_Time_to_regular__c': u'18:00', u'Friday_From__c': u'Closed', u'Monday_From__c': u'09:00', u'Saturday_To__c': u'Closed', u'Company_Website_Address__c': u'www.1000heads.com', u'Wednesday_From__c': u'09:00', u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/0011000000jki9iAAA', u'type': u'Account'}, u'Thursday_From__c': u'09:00', u'Publishing_agreement_for_BD__c': u'Publish only name and address in DMCC online/printed content'}}, {u'Account__c': u'0011000000jkjRyAAI', u'Building__c': u'Platinum Tower', u'Property_Location__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Property_Location__c/a1G10000000XKGtEAO', u'type': u'Property_Location__c'}, u'Id': u'a1G10000000XKGtEAO', u'Function_Type_Class__c': u'Retail', u'Name': u'PL-012858'}, u'Operating_Name__c': u'0011000000jkjRyAAI', u'Longitude__c': u'55.14244634', u'License_Address_for_Business_Directroy__c': u'Unit No: G07<br>Platinum Tower<br>Jumeirah Lakes Towers<br>Dubai<br>UAE', u'License_Status__c': u'Active', u'Property_Location__c': u'a1G10000000XKGtEAO', u'attributes': {u'url': u'/services/data/v37.0/sobjects/License__c/a031000000Li7cUAAR', u'type': u'License__c'}, u'Operating_Name__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/0011000000jkjRyAAI', u'type': u'Account'}, u'Id': u'0011000000jkjRyAAI', u'Name': u'101 PARATHAS DMCC'}, u'License_Address__c': u'Unit No: G07<br>Platinum Tower<br>Plot No: JLT-PH1-I2<br>Jumeirah Lakes Towers<br>Dubai<br>UAE', u'Id': u'a031000000Li7cUAAR', u'Latitude__c': u'25.06927734', u'Account__r': {u'Company_Official_Email_Address__c': u'pankajpathak@hotmail.com', u'Monday_To__c': u'23:00', u'Tuesday_To__c': u'23:00', u'Thursday_To__c': u'23:00', u'Id': u'0011000000jkjRyAAI', u'Operating_Time_from_regular__c': u'09:00', u'Name': u'101 PARATHAS DMCC', u'Tuesday_From__c': u'09:00', u'Twitter_Name__c': u'www.twitter.com/101parathas', u'Saturday_From__c': u'09:00', u'Phone_BD__c': u'+97144249950', u'Facebook_Link__c': u'www.facebook.com/101parathas', u'Wednesday_To__c': u'23:00', u'Operating_Time_to_regular__c': u'23:00', u'Friday_To__c': u'23:00', u'Friday_From__c': u'09:00', u'Monday_From__c': u'09:00', u'Saturday_To__c': u'23:00', u'Company_Website_Address__c': u'www.101parathas.com', u'Wednesday_From__c': u'09:00', u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/0011000000jkjRyAAI', u'type': u'Account'}, u'Thursday_From__c': u'09:00', u'Publishing_agreement_for_BD__c': u'Publish all details in DMCC online/printed content'}}, {u'Account__c': u'0011000000kWyCsAAK', u'License_Status__c': u'Active', u'Property_Location__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Property_Location__c/a1G10000000XHXzEAO', u'type': u'Property_Location__c'}, u'Id': u'a1G10000000XHXzEAO', u'Function_Type_Class__c': u'Flexi Desk', u'Name': u'PL-004825'}, u'Account__r': {u'Saturday_To__c': u'Closed', u'Tuesday_From__c': u'09:00', u'Friday_From__c': u'Closed', u'Phone_BD__c': u'+971529859280', u'Monday_From__c': u'09:00', u'Monday_To__c': u'18:00', u'Publishing_agreement_for_BD__c': u'Publish only name and address in DMCC online/printed content', u'Company_Official_Email_Address__c': u'csaplala@lkmbgroup.com', u'Company_Website_Address__c': u'www', u'Operating_Time_to_regular__c': u'18:00', u'Friday_To__c': u'Closed', u'Tuesday_To__c': u'18:00', u'Wednesday_To__c': u'18:00', u'Wednesday_From__c': u'09:00', u'Thursday_To__c': u'18:00', u'Saturday_From__c': u'Closed', u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/0011000000kWyCsAAK', u'type': u'Account'}, u'Thursday_From__c': u'09:00', u'Id': u'0011000000kWyCsAAK', u'Operating_Time_from_regular__c': u'09:00', u'Name': u'1682 CONSULTING DMCC'}, u'Building__c': u'55.13646334', u'Property_Location__c': u'a1G10000000XHXzEAO', u'attributes': {u'url': u'/services/data/v37.0/sobjects/License__c/a031000000N2n8DAAR', u'type': u'License__c'}, u'License_Address__c': u'Unit No: 3O-01-1057<br>Jewellery &amp; Gemplex 3<br>Plot No: DMCC-PH2-J&amp;GPlexS<br>Jewellery &amp; Gemplex<br>Dubai<br>United Arab Emirates', u'Id': u'a031000000N2n8DAAR'}, {u'Account__c': u'0011000000riesbAAA', u'Building__c': u'55.13646334', u'Property_Location__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Property_Location__c/a1G10000000XHGNEA4', u'type': u'Property_Location__c'}, u'Id': u'a1G10000000XHGNEA4', u'Function_Type_Class__c': u'Flexi Desk', u'Name': u'PL-003733'}, u'Operating_Name__c': u'00110000012j7wXAAQ', u'Account__r': {u'Saturday_To__c': u'Closed', u'Tuesday_From__c': u'09:00', u'Friday_From__c': u'Closed', u'Phone_BD__c': u'+971502473000', u'Monday_From__c': u'09:00', u'Monday_To__c': u'13:00', u'Publishing_agreement_for_BD__c': u'Publish only name and address in DMCC online/printed content', u'Company_Official_Email_Address__c': u'thaer@1765hospitlity.com', u'Company_Website_Address__c': u'www.1765hospitality.com', u'Operating_Time_to_regular__c': u'13:00', u'Friday_To__c': u'Closed', u'Tuesday_To__c': u'13:00', u'Wednesday_To__c': u'13:00', u'Wednesday_From__c': u'09:00', u'Thursday_To__c': u'13:00', u'Saturday_From__c': u'Closed', u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/0011000000riesbAAA', u'type': u'Account'}, u'Thursday_From__c': u'09:00', u'Id': u'0011000000riesbAAA', u'Operating_Time_from_regular__c': u'09:00', u'Name': u'1765 HOSPITALITY DMCC'}, u'License_Status__c': u'Active', u'Property_Location__c': u'a1G10000000XHGNEA4', u'attributes': {u'url': u'/services/data/v37.0/sobjects/License__c/a031000000TaAN9AAN', u'type': u'License__c'}, u'Operating_Name__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/00110000012j7wXAAQ', u'type': u'Account'}, u'Id': u'00110000012j7wXAAQ', u'Name': u'SPIKY HOUSE OF CHICKEN'}, u'License_Address__c': u'Unit No: 3O-01-398<br>Jewellery &amp; Gemplex 3<br>Plot No: DMCC-PH2-J&amp;GPlexS<br>Jewellery &amp; Gemplex<br>Dubai<br>United Arab Emirates', u'Id': u'a031000000TaAN9AAN'}, {u'Account__c': u'0011000000xj8H3AAI', u'Building__c': u'55.15331119', u'Property_Location__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Property_Location__c/a1G10000001YJ39EAG', u'type': u'Property_Location__c'}, u'Id': u'a1G10000001YJ39EAG', u'Function_Type_Class__c': u'Office', u'Name': u'PL-180144'}, u'Longitude__c': u'55.15331119', u'License_Address_for_Business_Directroy__c': u'Unit No: 2703-B<br>Jumeirah Bay Tower X3<br>Jumeirah Lakes Towers<br>Dubai<br>UAE', u'License_Status__c': u'Active', u'Property_Location__c': u'a1G10000001YJ39EAG', u'attributes': {u'url': u'/services/data/v37.0/sobjects/License__c/a031000000YZaeMAAT', u'type': u'License__c'}, u'License_Address__c': u'Unit No: 2703-B<br>Jumeirah Bay Tower X3<br>Plot No: JLT-PH2-X3A<br>Jumeirah Lakes Towers<br>Dubai<br>UAE', u'Id': u'a031000000YZaeMAAT', u'Latitude__c': u'25.08018592', u'Account__r': {u'Company_Official_Email_Address__c': u'lobo.rosario@alwadiholding.com', u'Monday_To__c': u'20:00', u'Tuesday_To__c': u'20:00', u'Thursday_To__c': u'20:00', u'Operating_Time_from_regular__c': u'07:00', u'Name': u'1851 LAUNDRY AND DRY CLEANING SERVICES DMCC', u'Website': u'www1851laundries.com', u'Tuesday_From__c': u'07:00', u'Saturday_From__c': u'07:00', u'Phone_BD__c': u'+971508669551', u'Friday_To__c': u'20:00', u'Wednesday_To__c': u'20:00', u'Operating_Time_to_regular__c': u'20:00', u'Saturday_To__c': u'20:00', u'Friday_From__c': u'14:00', u'Monday_From__c': u'07:00', u'Publishing_agreement_for_BD__c': u'Publish all details in DMCC online/printed content', u'Company_Website_Address__c': u'www.1851laundries.com', u'Id': u'0011000000xj8H3AAI', u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/0011000000xj8H3AAI', u'type': u'Account'}, u'Thursday_From__c': u'07:00', u'Wednesday_From__c': u'07:00'}}, {u'Account__c': u'0011000000rhvGWAAY', u'License_Status__c': u'Active', u'Property_Location__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Property_Location__c/a1G10000000XHwFEAW', u'type': u'Property_Location__c'}, u'Id': u'a1G10000000XHwFEAW', u'Function_Type_Class__c': u'Flexi Desk', u'Name': u'PL-006329'}, u'Account__r': {u'Saturday_To__c': u'20:00', u'Tuesday_From__c': u'08:00', u'Friday_From__c': u'08:00', u'Phone_BD__c': u'+971562160241', u'Monday_From__c': u'08:00', u'Monday_To__c': u'20:00', u'Publishing_agreement_for_BD__c': u'Publish only name and address in DMCC online/printed content', u'Company_Official_Email_Address__c': u'l.olivari@gmail.com', u'Company_Website_Address__c': u'N.A.', u'Operating_Time_to_regular__c': u'20:00', u'Friday_To__c': u'20:00', u'Tuesday_To__c': u'20:00', u'Wednesday_To__c': u'20:00', u'Wednesday_From__c': u'08:00', u'Thursday_To__c': u'20:00', u'Saturday_From__c': u'08:00', u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/0011000000rhvGWAAY', u'type': u'Account'}, u'Thursday_From__c': u'08:00', u'Id': u'0011000000rhvGWAAY', u'Operating_Time_from_regular__c': u'08:00', u'Name': u'19 FAMILY & BUSINESS DMCC'}, u'Building__c': u'55.13646334', u'Property_Location__c': u'a1G10000000XHwFEAW', u'attributes': {u'url': u'/services/data/v37.0/sobjects/License__c/a031000000RhyojAAB', u'type': u'License__c'}, u'License_Address__c': u'Unit No: 3O-01-654<br>Jewellery &amp; Gemplex 3<br>Plot No: DMCC-PH2-J&amp;GPlexS<br>Jewellery &amp; Gemplex<br>Dubai<br>United Arab Emirates', u'Id': u'a031000000RhyojAAB'}, {u'Account__c': u'001b000000MV2kHAAT', u'Building__c': u'Tiffany Towers', u'Property_Location__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Property_Location__c/a1G10000000XJi2EAG', u'type': u'Property_Location__c'}, u'Id': u'a1G10000000XJi2EAG', u'Function_Type_Class__c': u'Office', u'Name': u'PL-010697'}, u'Operating_Name__c': u'001b000000MV2kHAAT', u'Longitude__c': u'55.14960835', u'License_Address_for_Business_Directroy__c': u'Unit No: 403<br>Tiffany Towers<br>Jumeirah Lakes Towers<br>Dubai<br>UAE', u'License_Status__c': u'Active', u'Property_Location__c': u'a1G10000000XJi2EAG', u'attributes': {u'url': u'/services/data/v37.0/sobjects/License__c/a03b0000006h21sAAA', u'type': u'License__c'}, u'Operating_Name__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/001b000000MV2kHAAT', u'type': u'Account'}, u'Id': u'001b000000MV2kHAAT', u'Name': u'21ST CENTURY GROUP HOLDINGS LIMITED (BRANCH)'}, u'License_Address__c': u'Unit No: 403<br>Tiffany Towers<br>Plot No: JLT-PH2-W2A<br>Jumeirah Lakes Towers<br>Dubai<br>UAE', u'Id': u'a03b0000006h21sAAA', u'Latitude__c': u'25.07726334', u'Account__r': {u'Saturday_To__c': u'Closed', u'Tuesday_From__c': u'08:00', u'Friday_From__c': u'Closed', u'Phone_BD__c': u'+97144269021', u'Monday_From__c': u'08:00', u'Monday_To__c': u'17:00', u'Publishing_agreement_for_BD__c': u'Publish all details in DMCC online/printed content', u'Company_Official_Email_Address__c': u'bhavani@areefinvestments.com', u'Company_Website_Address__c': u'www.areefinvestments.com', u'Operating_Time_to_regular__c': u'17:00', u'Friday_To__c': u'Closed', u'Tuesday_To__c': u'17:00', u'Wednesday_To__c': u'17:00', u'Wednesday_From__c': u'08:00', u'Thursday_To__c': u'17:00', u'Saturday_From__c': u'Closed', u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/001b000000MV2kHAAT', u'type': u'Account'}, u'Thursday_From__c': u'08:00', u'Id': u'001b000000MV2kHAAT', u'Operating_Time_from_regular__c': u'08:00', u'Name': u'21ST CENTURY GROUP HOLDINGS LIMITED (BRANCH)'}}, {u'Account__c': u'001b000000MV3NPAA1', u'Building__c': u'55.13603472', u'Property_Location__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Property_Location__c/a1G10000000XHF4EAO', u'type': u'Property_Location__c'}, u'Id': u'a1G10000000XHF4EAO', u'Function_Type_Class__c': u'Flexi Desk', u'Name': u'PL-003652'}, u'Operating_Name__c': u'001b000000MV3NPAA1', u'Account__r': {u'Saturday_To__c': u'Closed', u'Tuesday_From__c': u'10:00', u'Friday_From__c': u'Closed', u'Phone_BD__c': u'+971556099922', u'Monday_From__c': u'10:00', u'Monday_To__c': u'16:00', u'Publishing_agreement_for_BD__c': u'Publish only name and address in DMCC online/printed content', u'Company_Official_Email_Address__c': u'sameh@237communications.com', u'Company_Website_Address__c': u'237communications.com', u'Operating_Time_to_regular__c': u'16:00', u'Friday_To__c': u'Closed', u'Tuesday_To__c': u'16:00', u'Wednesday_To__c': u'16:00', u'Wednesday_From__c': u'10:00', u'Thursday_To__c': u'16:00', u'Saturday_From__c': u'Closed', u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/001b000000MV3NPAA1', u'type': u'Account'}, u'Thursday_From__c': u'10:00', u'Id': u'001b000000MV3NPAA1', u'Operating_Time_from_regular__c': u'10:00', u'Name': u'237 COMMUNICATIONS DMCC'}, u'License_Status__c': u'Active', u'Property_Location__c': u'a1G10000000XHF4EAO', u'attributes': {u'url': u'/services/data/v37.0/sobjects/License__c/a03b0000006h16xAAA', u'type': u'License__c'}, u'Operating_Name__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/001b000000MV3NPAA1', u'type': u'Account'}, u'Id': u'001b000000MV3NPAA1', u'Name': u'237 COMMUNICATIONS DMCC'}, u'License_Address__c': u'Unit No: 2H-05-412<br>Jewellery &amp; Gemplex 2<br>Plot No: DMCC-PH2-J&amp;GPlexS<br>Jewellery &amp; Gemplex<br>DUBAI<br>United Arab Emirates', u'Id': u'a03b0000006h16xAAA'}], u'result_count': 0}

What you want is in data[u'sObjects'] , to get the company websites:

for d in data['sObjects']:
    if 'Company_Website_Address__c' in d['Account__r']:
        print(d['Account__r']['Company_Website_Address__c'])

Which gives you:

www.1on1hrconsulting.com
www.1platinumconcierge.com
www.1000heads.com
www.101parathas.com
www
www.1765hospitality.com
www.1851laundries.com
N.A.
www.areefinvestments.com
237communications.com

You can see some companies don't have a website listed, you will have to decide what to do with those. If you just print d[u'Account__r'] you can see all the info for each company. You should also be aware that it is an internal api so make sure you are violating their terms of service by scraping their site although they should probably implement their authToken logic a bit more stringently to prevent calls to the api if they don't want to be scraped so easily. You can see it in chrome tools when you make a request but it is not required.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM