简体   繁体   English

Python 2.7 BeautifulSoup,网站地址抓取

[英]Python 2.7 BeautifulSoup, website addresses scraping

Hope you are all well. 希望你一切都好。 I'm new in Python and using python 2.7. 我是使用Python 2.7的Python新手。

I'm trying to extract only the websites from this public website business directory: https://www.dmcc.ae/business-directory 我试图从此公共网站业务目录中仅提取网站: https : //www.dmcc.ae/business-directory
the websites i'm looking for are the websites mentioned in every widget. 我要查找的网站是每个小部件中提到的网站。 This directory does not have an API unfortunately. 不幸的是,该目录没有API。
I'm using BeautifulSoup, but with no success so far. 我正在使用BeautifulSoup,但到目前为止没有成功。
here is mycode: 这是mycode:

import urllib
from bs4 import BeautifulSoup
website = raw_input("Type Website:>\n")
html = urllib.urlopen('https://'+ website).read()
soup = BeautifulSoup(html)
tags = soup('a')
for tag in tags:
    print tag.get('href', None)

what i get is just the website of the actual website , like http://portal.dmcc.ae along with other href rather then the websites in the widgets. 我得到的只是实际网站的网站,例如http://portal.dmcc.ae以及其他href,而不是小部件中的网站。 i also tried replacing soup('a') with soup ('class'), but no luck! 我也尝试用汤(“ class”)代替汤(“ a”),但是没有运气! Can anybody help me please? 有人可以帮我吗?

The data is dynamically generate using Jquery though an ajax request, you can do a get request to the url to get the dynamically loaded data: 数据是通过ajax请求使用Jquery动态生成的,您可以对url进行get请求以获取动态加载的数据:

from requests import Session
from time import time
data = {
        "page_num": "1", # set it to whatever page you like
        "query_type": "activities",
        "_": str(int(time()))}
js_url = "https://dmcc.secure.force.com/services/apexrest/DMCC_BusinessDirectory_API_1/get"
with Session() as s:
    soup = BeautifulSoup(s.get("https://www.dmcc.ae/business-directory").content, "html.parser")
    r = s.get(js_url, params=data)

    data = r.json()

Which will give you: 这会给你:

{u'success': True, u'requestURI': u'/DMCC_BusinessDirectory_API_1/get', u'params': [u'DMCC_BusinessDirectory_API_1', u'get', u' ', u' '], u'message': u'Getting all activities.', u'sObjects': [{u'Account__c': u'001b000000MV4LaAAL', u'Building__c': u'55.1450717', u'Property_Location__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Property_Location__c/a1G10000000XPVOEA4', u'type': u'Property_Location__c'}, u'Id': u'a1G10000000XPVOEA4', u'Function_Type_Class__c': u'Office', u'Name': u'PL-032972'}, u'Operating_Name__c': u'001b000000MV4LaAAL', u'Longitude__c': u'55.1450717', u'License_Address_for_Business_Directroy__c': u'Unit No: 3006-002<br>Mazaya Business Avenue BB1<br>Jumeirah Lakes Towers<br>Dubai<br>UAE', u'License_Status__c': u'Active', u'Property_Location__c': u'a1G10000000XPVOEA4', u'attributes': {u'url': u'/services/data/v37.0/sobjects/License__c/a03b0000006h16cAAA', u'type': u'License__c'}, u'Operating_Name__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/001b000000MV4LaAAL', u'type': u'Account'}, u'Id': u'001b000000MV4LaAAL', u'Name': u'1 ON 1 HR CONSULTING DMCC'}, u'License_Address__c': u'Unit No: 3006-002<br>Mazaya Business Avenue BB1<br>Plot No: JLTE-PH2-BB1<br>Jumeirah Lakes Towers<br>Dubai<br>UAE', u'Id': u'a03b0000006h16cAAA', u'Latitude__c': u'25.06828081', u'Account__r': {u'Company_Official_Email_Address__c': u'resume@1on1hrconsulting.com', u'Monday_To__c': u'18:00', u'Tuesday_To__c': u'18:00', u'Thursday_To__c': u'18:00', u'Id': u'001b000000MV4LaAAL', u'Operating_Time_from_regular__c': u'08:00', u'Name': u'1 ON 1 HR CONSULTING DMCC', u'Tuesday_From__c': u'08:00', u'LinkedIn_URL__c': u'https://www.linkedin.com/company/1on1-hr-consulting', u'Saturday_From__c': u'Closed', u'Phone_BD__c': u'+97144470173', u'Facebook_Link__c': u'https://www.facebook.com/duabiinterviewandresumecoaching', u'Wednesday_To__c': u'18:00', u'Operating_Time_to_regular__c': u'18:00', u'Friday_To__c': u'Closed', u'Friday_From__c': u'Closed', u'Monday_From__c': u'08:00', u'Saturday_To__c': u'Closed', u'Company_Website_Address__c': u'www.1on1hrconsulting.com', u'Wednesday_From__c': u'08:00', u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/001b000000MV4LaAAL', u'type': u'Account'}, u'Thursday_From__c': u'08:00', u'Publishing_agreement_for_BD__c': u'Publish all details in DMCC online/printed content'}}, {u'Account__c': u'001b000000MV2s2AAD', u'Building__c': u'Gold Tower (AU)', u'Property_Location__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Property_Location__c/a1G10000000XHiKEAW', u'type': u'Property_Location__c'}, u'Id': u'a1G10000000XHiKEAW', u'Function_Type_Class__c': u'Office', u'Name': u'PL-005466'}, u'Operating_Name__c': u'001b000000MV2s2AAD', u'Longitude__c': u'55.14324963', u'License_Address_for_Business_Directroy__c': u'Unit No: AU-23-J<br>Gold Tower (AU)<br>Jumeirah Lakes Towers<br>Dubai<br>United Arab Emirates', u'License_Status__c': u'Active', u'Property_Location__c': u'a1G10000000XHiKEAW', u'attributes': {u'url': u'/services/data/v37.0/sobjects/License__c/a03b0000006h1DaAAI', u'type': u'License__c'}, u'Operating_Name__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/001b000000MV2s2AAD', u'type': u'Account'}, u'Id': u'001b000000MV2s2AAD', u'Name': u'1 PLATINUM CONCIERGE MANAGEMENT DMCC'}, u'License_Address__c': u'Unit No: AU-23-J<br>Gold Tower (AU)<br>Plot No: JLT-PH1-I3A<br>Jumeirah Lakes Towers<br>Dubai<br>United Arab Emirates', u'Id': u'a03b0000006h1DaAAI', u'Latitude__c': u'25.06937097', u'Account__r': {u'Saturday_To__c': u'Closed', u'Tuesday_From__c': u'09:00', u'Friday_From__c': u'Closed', u'Phone_BD__c': u'+971505687478', u'Monday_From__c': u'09:00', u'Monday_To__c': u'21:00', u'Publishing_agreement_for_BD__c': u'Publish only name and address in DMCC online/printed content', u'Company_Official_Email_Address__c': u'melvin@1boxoffice.ae', u'Company_Website_Address__c': u'www.1platinumconcierge.com', u'Operating_Time_to_regular__c': u'21:00', u'Friday_To__c': u'Closed', u'Tuesday_To__c': u'21:00', u'Wednesday_To__c': u'21:00', u'Wednesday_From__c': u'09:00', u'Thursday_To__c': u'21:00', u'Saturday_From__c': u'Closed', u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/001b000000MV2s2AAD', u'type': u'Account'}, u'Thursday_From__c': u'09:00', u'Id': u'001b000000MV2s2AAD', u'Operating_Time_from_regular__c': u'09:00', u'Name': u'1 PLATINUM CONCIERGE MANAGEMENT DMCC'}}, {u'Account__c': u'0011000000jki9iAAA', u'Building__c': u'Tiffany Towers', u'Property_Location__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Property_Location__c/a1G10000000XL4ZEAW', u'type': u'Property_Location__c'}, u'Id': u'a1G10000000XL4ZEAW', u'Function_Type_Class__c': u'Office', u'Name': u'PL-015938'}, u'Operating_Name__c': u'0011000000jki9iAAA', u'Longitude__c': u'55.14960835', u'License_Address_for_Business_Directroy__c': u'Unit No: 1906<br>Tiffany Towers<br>Jumeirah Lakes Towers<br>Dubai<br>UAE', u'License_Status__c': u'Active', u'Property_Location__c': u'a1G10000000XL4ZEAW', u'attributes': {u'url': u'/services/data/v37.0/sobjects/License__c/a031000000LBSCcAAP', u'type': u'License__c'}, u'Operating_Name__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/0011000000jki9iAAA', u'type': u'Account'}, u'Id': u'0011000000jki9iAAA', u'Name': u'1000HEADS CONSULTING DMCC'}, u'License_Address__c': u'Unit No: 1906<br>Tiffany Towers<br>Plot No: JLT-PH2-W2A<br>Jumeirah Lakes Towers<br>Dubai<br>UAE', u'Id': u'a031000000LBSCcAAP', u'Latitude__c': u'25.07726334', u'Account__r': {u'Company_Official_Email_Address__c': u'dubai@1000heads.com', u'Monday_To__c': u'18:00', u'Tuesday_To__c': u'18:00', u'Thursday_To__c': u'18:00', u'Id': u'0011000000jki9iAAA', u'Operating_Time_from_regular__c': u'09:00', u'Name': u'1000HEADS CONSULTING DMCC', u'Tuesday_From__c': u'09:00', u'Twitter_Name__c': u'@1000heads', u'Saturday_From__c': u'Closed', u'Phone_BD__c': u'+97143641221', u'Friday_To__c': u'Closed', u'Wednesday_To__c': u'18:00', u'Operating_Time_to_regular__c': u'18:00', u'Friday_From__c': u'Closed', u'Monday_From__c': u'09:00', u'Saturday_To__c': u'Closed', u'Company_Website_Address__c': u'www.1000heads.com', u'Wednesday_From__c': u'09:00', u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/0011000000jki9iAAA', u'type': u'Account'}, u'Thursday_From__c': u'09:00', u'Publishing_agreement_for_BD__c': u'Publish only name and address in DMCC online/printed content'}}, {u'Account__c': u'0011000000jkjRyAAI', u'Building__c': u'Platinum Tower', u'Property_Location__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Property_Location__c/a1G10000000XKGtEAO', u'type': u'Property_Location__c'}, u'Id': u'a1G10000000XKGtEAO', u'Function_Type_Class__c': u'Retail', u'Name': u'PL-012858'}, u'Operating_Name__c': u'0011000000jkjRyAAI', u'Longitude__c': u'55.14244634', u'License_Address_for_Business_Directroy__c': u'Unit No: G07<br>Platinum Tower<br>Jumeirah Lakes Towers<br>Dubai<br>UAE', u'License_Status__c': u'Active', u'Property_Location__c': u'a1G10000000XKGtEAO', u'attributes': {u'url': u'/services/data/v37.0/sobjects/License__c/a031000000Li7cUAAR', u'type': u'License__c'}, u'Operating_Name__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/0011000000jkjRyAAI', u'type': u'Account'}, u'Id': u'0011000000jkjRyAAI', u'Name': u'101 PARATHAS DMCC'}, u'License_Address__c': u'Unit No: G07<br>Platinum Tower<br>Plot No: JLT-PH1-I2<br>Jumeirah Lakes Towers<br>Dubai<br>UAE', u'Id': u'a031000000Li7cUAAR', u'Latitude__c': u'25.06927734', u'Account__r': {u'Company_Official_Email_Address__c': u'pankajpathak@hotmail.com', u'Monday_To__c': u'23:00', u'Tuesday_To__c': u'23:00', u'Thursday_To__c': u'23:00', u'Id': u'0011000000jkjRyAAI', u'Operating_Time_from_regular__c': u'09:00', u'Name': u'101 PARATHAS DMCC', u'Tuesday_From__c': u'09:00', u'Twitter_Name__c': u'www.twitter.com/101parathas', u'Saturday_From__c': u'09:00', u'Phone_BD__c': u'+97144249950', u'Facebook_Link__c': u'www.facebook.com/101parathas', u'Wednesday_To__c': u'23:00', u'Operating_Time_to_regular__c': u'23:00', u'Friday_To__c': u'23:00', u'Friday_From__c': u'09:00', u'Monday_From__c': u'09:00', u'Saturday_To__c': u'23:00', u'Company_Website_Address__c': u'www.101parathas.com', u'Wednesday_From__c': u'09:00', u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/0011000000jkjRyAAI', u'type': u'Account'}, u'Thursday_From__c': u'09:00', u'Publishing_agreement_for_BD__c': u'Publish all details in DMCC online/printed content'}}, {u'Account__c': u'0011000000kWyCsAAK', u'License_Status__c': u'Active', u'Property_Location__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Property_Location__c/a1G10000000XHXzEAO', u'type': u'Property_Location__c'}, u'Id': u'a1G10000000XHXzEAO', u'Function_Type_Class__c': u'Flexi Desk', u'Name': u'PL-004825'}, u'Account__r': {u'Saturday_To__c': u'Closed', u'Tuesday_From__c': u'09:00', u'Friday_From__c': u'Closed', u'Phone_BD__c': u'+971529859280', u'Monday_From__c': u'09:00', u'Monday_To__c': u'18:00', u'Publishing_agreement_for_BD__c': u'Publish only name and address in DMCC online/printed content', u'Company_Official_Email_Address__c': u'csaplala@lkmbgroup.com', u'Company_Website_Address__c': u'www', u'Operating_Time_to_regular__c': u'18:00', u'Friday_To__c': u'Closed', u'Tuesday_To__c': u'18:00', u'Wednesday_To__c': u'18:00', u'Wednesday_From__c': u'09:00', u'Thursday_To__c': u'18:00', u'Saturday_From__c': u'Closed', u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/0011000000kWyCsAAK', u'type': u'Account'}, u'Thursday_From__c': u'09:00', u'Id': u'0011000000kWyCsAAK', u'Operating_Time_from_regular__c': u'09:00', u'Name': u'1682 CONSULTING DMCC'}, u'Building__c': u'55.13646334', u'Property_Location__c': u'a1G10000000XHXzEAO', u'attributes': {u'url': u'/services/data/v37.0/sobjects/License__c/a031000000N2n8DAAR', u'type': u'License__c'}, u'License_Address__c': u'Unit No: 3O-01-1057<br>Jewellery &amp; Gemplex 3<br>Plot No: DMCC-PH2-J&amp;GPlexS<br>Jewellery &amp; Gemplex<br>Dubai<br>United Arab Emirates', u'Id': u'a031000000N2n8DAAR'}, {u'Account__c': u'0011000000riesbAAA', u'Building__c': u'55.13646334', u'Property_Location__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Property_Location__c/a1G10000000XHGNEA4', u'type': u'Property_Location__c'}, u'Id': u'a1G10000000XHGNEA4', u'Function_Type_Class__c': u'Flexi Desk', u'Name': u'PL-003733'}, u'Operating_Name__c': u'00110000012j7wXAAQ', u'Account__r': {u'Saturday_To__c': u'Closed', u'Tuesday_From__c': u'09:00', u'Friday_From__c': u'Closed', u'Phone_BD__c': u'+971502473000', u'Monday_From__c': u'09:00', u'Monday_To__c': u'13:00', u'Publishing_agreement_for_BD__c': u'Publish only name and address in DMCC online/printed content', u'Company_Official_Email_Address__c': u'thaer@1765hospitlity.com', u'Company_Website_Address__c': u'www.1765hospitality.com', u'Operating_Time_to_regular__c': u'13:00', u'Friday_To__c': u'Closed', u'Tuesday_To__c': u'13:00', u'Wednesday_To__c': u'13:00', u'Wednesday_From__c': u'09:00', u'Thursday_To__c': u'13:00', u'Saturday_From__c': u'Closed', u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/0011000000riesbAAA', u'type': u'Account'}, u'Thursday_From__c': u'09:00', u'Id': u'0011000000riesbAAA', u'Operating_Time_from_regular__c': u'09:00', u'Name': u'1765 HOSPITALITY DMCC'}, u'License_Status__c': u'Active', u'Property_Location__c': u'a1G10000000XHGNEA4', u'attributes': {u'url': u'/services/data/v37.0/sobjects/License__c/a031000000TaAN9AAN', u'type': u'License__c'}, u'Operating_Name__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/00110000012j7wXAAQ', u'type': u'Account'}, u'Id': u'00110000012j7wXAAQ', u'Name': u'SPIKY HOUSE OF CHICKEN'}, u'License_Address__c': u'Unit No: 3O-01-398<br>Jewellery &amp; Gemplex 3<br>Plot No: DMCC-PH2-J&amp;GPlexS<br>Jewellery &amp; Gemplex<br>Dubai<br>United Arab Emirates', u'Id': u'a031000000TaAN9AAN'}, {u'Account__c': u'0011000000xj8H3AAI', u'Building__c': u'55.15331119', u'Property_Location__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Property_Location__c/a1G10000001YJ39EAG', u'type': u'Property_Location__c'}, u'Id': u'a1G10000001YJ39EAG', u'Function_Type_Class__c': u'Office', u'Name': u'PL-180144'}, u'Longitude__c': u'55.15331119', u'License_Address_for_Business_Directroy__c': u'Unit No: 2703-B<br>Jumeirah Bay Tower X3<br>Jumeirah Lakes Towers<br>Dubai<br>UAE', u'License_Status__c': u'Active', u'Property_Location__c': u'a1G10000001YJ39EAG', u'attributes': {u'url': u'/services/data/v37.0/sobjects/License__c/a031000000YZaeMAAT', u'type': u'License__c'}, u'License_Address__c': u'Unit No: 2703-B<br>Jumeirah Bay Tower X3<br>Plot No: JLT-PH2-X3A<br>Jumeirah Lakes Towers<br>Dubai<br>UAE', u'Id': u'a031000000YZaeMAAT', u'Latitude__c': u'25.08018592', u'Account__r': {u'Company_Official_Email_Address__c': u'lobo.rosario@alwadiholding.com', u'Monday_To__c': u'20:00', u'Tuesday_To__c': u'20:00', u'Thursday_To__c': u'20:00', u'Operating_Time_from_regular__c': u'07:00', u'Name': u'1851 LAUNDRY AND DRY CLEANING SERVICES DMCC', u'Website': u'www1851laundries.com', u'Tuesday_From__c': u'07:00', u'Saturday_From__c': u'07:00', u'Phone_BD__c': u'+971508669551', u'Friday_To__c': u'20:00', u'Wednesday_To__c': u'20:00', u'Operating_Time_to_regular__c': u'20:00', u'Saturday_To__c': u'20:00', u'Friday_From__c': u'14:00', u'Monday_From__c': u'07:00', u'Publishing_agreement_for_BD__c': u'Publish all details in DMCC online/printed content', u'Company_Website_Address__c': u'www.1851laundries.com', u'Id': u'0011000000xj8H3AAI', u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/0011000000xj8H3AAI', u'type': u'Account'}, u'Thursday_From__c': u'07:00', u'Wednesday_From__c': u'07:00'}}, {u'Account__c': u'0011000000rhvGWAAY', u'License_Status__c': u'Active', u'Property_Location__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Property_Location__c/a1G10000000XHwFEAW', u'type': u'Property_Location__c'}, u'Id': u'a1G10000000XHwFEAW', u'Function_Type_Class__c': u'Flexi Desk', u'Name': u'PL-006329'}, u'Account__r': {u'Saturday_To__c': u'20:00', u'Tuesday_From__c': u'08:00', u'Friday_From__c': u'08:00', u'Phone_BD__c': u'+971562160241', u'Monday_From__c': u'08:00', u'Monday_To__c': u'20:00', u'Publishing_agreement_for_BD__c': u'Publish only name and address in DMCC online/printed content', u'Company_Official_Email_Address__c': u'l.olivari@gmail.com', u'Company_Website_Address__c': u'N.A.', u'Operating_Time_to_regular__c': u'20:00', u'Friday_To__c': u'20:00', u'Tuesday_To__c': u'20:00', u'Wednesday_To__c': u'20:00', u'Wednesday_From__c': u'08:00', u'Thursday_To__c': u'20:00', u'Saturday_From__c': u'08:00', u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/0011000000rhvGWAAY', u'type': u'Account'}, u'Thursday_From__c': u'08:00', u'Id': u'0011000000rhvGWAAY', u'Operating_Time_from_regular__c': u'08:00', u'Name': u'19 FAMILY & BUSINESS DMCC'}, u'Building__c': u'55.13646334', u'Property_Location__c': u'a1G10000000XHwFEAW', u'attributes': {u'url': u'/services/data/v37.0/sobjects/License__c/a031000000RhyojAAB', u'type': u'License__c'}, u'License_Address__c': u'Unit No: 3O-01-654<br>Jewellery &amp; Gemplex 3<br>Plot No: DMCC-PH2-J&amp;GPlexS<br>Jewellery &amp; Gemplex<br>Dubai<br>United Arab Emirates', u'Id': u'a031000000RhyojAAB'}, {u'Account__c': u'001b000000MV2kHAAT', u'Building__c': u'Tiffany Towers', u'Property_Location__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Property_Location__c/a1G10000000XJi2EAG', u'type': u'Property_Location__c'}, u'Id': u'a1G10000000XJi2EAG', u'Function_Type_Class__c': u'Office', u'Name': u'PL-010697'}, u'Operating_Name__c': u'001b000000MV2kHAAT', u'Longitude__c': u'55.14960835', u'License_Address_for_Business_Directroy__c': u'Unit No: 403<br>Tiffany Towers<br>Jumeirah Lakes Towers<br>Dubai<br>UAE', u'License_Status__c': u'Active', u'Property_Location__c': u'a1G10000000XJi2EAG', u'attributes': {u'url': u'/services/data/v37.0/sobjects/License__c/a03b0000006h21sAAA', u'type': u'License__c'}, u'Operating_Name__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/001b000000MV2kHAAT', u'type': u'Account'}, u'Id': u'001b000000MV2kHAAT', u'Name': u'21ST CENTURY GROUP HOLDINGS LIMITED (BRANCH)'}, u'License_Address__c': u'Unit No: 403<br>Tiffany Towers<br>Plot No: JLT-PH2-W2A<br>Jumeirah Lakes Towers<br>Dubai<br>UAE', u'Id': u'a03b0000006h21sAAA', u'Latitude__c': u'25.07726334', u'Account__r': {u'Saturday_To__c': u'Closed', u'Tuesday_From__c': u'08:00', u'Friday_From__c': u'Closed', u'Phone_BD__c': u'+97144269021', u'Monday_From__c': u'08:00', u'Monday_To__c': u'17:00', u'Publishing_agreement_for_BD__c': u'Publish all details in DMCC online/printed content', u'Company_Official_Email_Address__c': u'bhavani@areefinvestments.com', u'Company_Website_Address__c': u'www.areefinvestments.com', u'Operating_Time_to_regular__c': u'17:00', u'Friday_To__c': u'Closed', u'Tuesday_To__c': u'17:00', u'Wednesday_To__c': u'17:00', u'Wednesday_From__c': u'08:00', u'Thursday_To__c': u'17:00', u'Saturday_From__c': u'Closed', u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/001b000000MV2kHAAT', u'type': u'Account'}, u'Thursday_From__c': u'08:00', u'Id': u'001b000000MV2kHAAT', u'Operating_Time_from_regular__c': u'08:00', u'Name': u'21ST CENTURY GROUP HOLDINGS LIMITED (BRANCH)'}}, {u'Account__c': u'001b000000MV3NPAA1', u'Building__c': u'55.13603472', u'Property_Location__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Property_Location__c/a1G10000000XHF4EAO', u'type': u'Property_Location__c'}, u'Id': u'a1G10000000XHF4EAO', u'Function_Type_Class__c': u'Flexi Desk', u'Name': u'PL-003652'}, u'Operating_Name__c': u'001b000000MV3NPAA1', u'Account__r': {u'Saturday_To__c': u'Closed', u'Tuesday_From__c': u'10:00', u'Friday_From__c': u'Closed', u'Phone_BD__c': u'+971556099922', u'Monday_From__c': u'10:00', u'Monday_To__c': u'16:00', u'Publishing_agreement_for_BD__c': u'Publish only name and address in DMCC online/printed content', u'Company_Official_Email_Address__c': u'sameh@237communications.com', u'Company_Website_Address__c': u'237communications.com', u'Operating_Time_to_regular__c': u'16:00', u'Friday_To__c': u'Closed', u'Tuesday_To__c': u'16:00', u'Wednesday_To__c': u'16:00', u'Wednesday_From__c': u'10:00', u'Thursday_To__c': u'16:00', u'Saturday_From__c': u'Closed', u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/001b000000MV3NPAA1', u'type': u'Account'}, u'Thursday_From__c': u'10:00', u'Id': u'001b000000MV3NPAA1', u'Operating_Time_from_regular__c': u'10:00', u'Name': u'237 COMMUNICATIONS DMCC'}, u'License_Status__c': u'Active', u'Property_Location__c': u'a1G10000000XHF4EAO', u'attributes': {u'url': u'/services/data/v37.0/sobjects/License__c/a03b0000006h16xAAA', u'type': u'License__c'}, u'Operating_Name__r': {u'attributes': {u'url': u'/services/data/v37.0/sobjects/Account/001b000000MV3NPAA1', u'type': u'Account'}, u'Id': u'001b000000MV3NPAA1', u'Name': u'237 COMMUNICATIONS DMCC'}, u'License_Address__c': u'Unit No: 2H-05-412<br>Jewellery &amp; Gemplex 2<br>Plot No: DMCC-PH2-J&amp;GPlexS<br>Jewellery &amp; Gemplex<br>DUBAI<br>United Arab Emirates', u'Id': u'a03b0000006h16xAAA'}], u'result_count': 0}

What you want is in data[u'sObjects'] , to get the company websites: 您要在data[u'sObjects']中获取公司网站:

for d in data['sObjects']:
    if 'Company_Website_Address__c' in d['Account__r']:
        print(d['Account__r']['Company_Website_Address__c'])

Which gives you: 这给你:

www.1on1hrconsulting.com
www.1platinumconcierge.com
www.1000heads.com
www.101parathas.com
www
www.1765hospitality.com
www.1851laundries.com
N.A.
www.areefinvestments.com
237communications.com

You can see some companies don't have a website listed, you will have to decide what to do with those. 您会看到有些公司没有列出网站,您将不得不决定如何处理这些网站。 If you just print d[u'Account__r'] you can see all the info for each company. 如果仅打印d[u'Account__r'] ,则可以查看每个公司的所有信息。 You should also be aware that it is an internal api so make sure you are violating their terms of service by scraping their site although they should probably implement their authToken logic a bit more stringently to prevent calls to the api if they don't want to be scraped so easily. 您还应该注意,这是一个内部api,因此请确保您通过刮擦其网站来违反其服务条款,尽管他们可能更严格地实施authToken逻辑,以防止如果他们不想对api进行调用容易刮。 You can see it in chrome tools when you make a request but it is not required. 发出请求时,您可以在chrome工具中看到它,但这不是必需的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM