繁体   English   中英

使用 Python Selenium 从弹出窗口中提取内容

[英]Extract contents from pop-up window with Python Selenium

我想从这个网页中提取一个人的简历(“John Reinsberg 是 Lazard Asset Management 的副主席,负责监督......”): https : //www.morningstar.com/funds/xnas/lziex/人们

例如看图片

我的代码不起作用,因为内容在弹出窗口中。 从一些现有的问题来看,似乎我需要使用 click() 然后从窗口中找到元素。 但是,我不知道如何定位要单击的元素。 谢谢。

from selenium import webdriver

options = webdriver.ChromeOptions()
options.add_argument('headless')
driver = webdriver.Chrome(options=options)
driver.get('https://www.morningstar.com/funds/xnas/lziex/people')
element=driver.find_elements_by_xpath('//*[@class="sal-modal-biography ng-binding ng-scope"]')
print(element.text) 

我也尝试过,但没有用:

element =  driver.find_element_by_xpath("//button[@class='sal-icons sal-icons--close mds-button mds-button--icon-only']")
driver.execute_script('arguments[0].click();',element)

driver.switch_to_alert()
print(driver.find_elements_by_xpath('//*[@class="sal-modal-biography ng-binding ng-scope"]'))

这是 HTML 的一部分。

<div class="sal-component-ctn sal-modal-scrollable" style="display: block;" aria-hidden="true"><div class="sal-component-mip-manager-pop-out reveal-modal mds-modal ng-isolate-scope open" data-reveal="" manager-data="vm.managerData" style="display: block; opacity: 1; visibility: visible; top: 335.333px;" tabindex="0" aria-hidden="false">
    <div class="sal-row">
        <div class="sal-manager-modal">
            <div class="sal-manager-modal__modalHeader" ng-class="{'sal-fixed':vm.fixedHeader}" ng-style="vm.headerStyle" style="height: auto; width: auto;">
                <span class="sal-modal-header__menu">
                    <button class="sal-icons sal-icons--close mds-button mds-button--icon-only" type="button">
                        <svg class="mds-icon mds-button__icon mds-button__icon--left">
                            <use xlink:href="#remove">
                                <title class="ng-binding">Close</title>
                            </use>
                        </svg>
                    </button>
                </span>
                <div class="sal-modal-header__title ng-binding">
                    John R. Reinsberg
                </div>
            </div>
            <div class="sal-manager-modal__body" ng-style="{'margin-top': vm.headerStyle.height}" style="margin-top: auto;">
                <div class="sal-modal-dps">
                    <ul class="sal-xsmall-block-grid-2 small-block-grid-3 medium-block-grid-5 large-block-grid-5">
                                      </ul>
                </div>
                <!-- ngIf: vm.managerModalData.fundManager.biography.managerProvidedBiography || (vm.managerModalData.fundManager.CollegeEducationDetailList && vm.managerModalData.fundManager.CollegeEducationDetailList.length > 0) --><div class="sal-columns sal-small-12 sal-medium-6 sal-large-6 ng-scope" ng-if="vm.managerModalData.fundManager.biography.managerProvidedBiography || (vm.managerModalData.fundManager.CollegeEducationDetailList &amp;&amp; vm.managerModalData.fundManager.CollegeEducationDetailList.length > 0)" ng-class="{'sal-medium-12 sal-large-12': !vm.managerModalData.currentManagedFundList || vm.managerModalData.currentManagedFundList.length === 0}">
                    <!-- ngIf: vm.managerModalData.fundManager.biography.managerProvidedBiography --><div class="sal-modal-biography ng-binding ng-scope" ng-if="vm.managerModalData.fundManager.biography.managerProvidedBiography">
                        <!-- ngIf: !vm.managerModalData.fundManager.biography.isLocalized -->
                        John Reinsberg is Deputy Chairman of Lazard Asset Management responsible for oversight of the firm's international and global strategies. He is also a Portfolio Manager/Analyst on the Global Equity and International Equity portfolio teams. He began working in the investment field in 1981. Prior to joining Lazard in 1992, John was Executive Vice President with General Electric Investment Corporation and Trustee of the General Electric Pension Trust.
                    </div><!-- end ngIf: vm.managerModalData.fundManager.biography.managerProvidedBiography -->

                    </div>
                </div>
            </div>
        </div>
    </div>
</div></div>

要从网页https://www.morningstar.com/funds/xnas/lziex/people 中提取“John Reinsberg 是 Lazard Asset Management 的副主席,负责监督...”的简介,您需要诱导WebDriverWait element_to_be_clickable()并且您可以使用以下定位器策略

  • 代码块:

     options = webdriver.ChromeOptions() options.add_argument("start-maximized") options.add_experimental_option("excludeSwitches", ["enable-automation"]) options.add_experimental_option('useAutomationExtension', False) driver = webdriver.Chrome(options=options, executable_path=r'C:\\Utility\\BrowserDrivers\\chromedriver.exe') driver.get("https://www.morningstar.com/funds/xnas/lziex/people") WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[@class='sal-management-team__memberName']/a//span[text()='Reinsberg']/.."))).click() print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.sal-modal-biography.ng-binding.ng-scope"))).text.strip())
  • 控制台输出:

     John Reinsberg is Deputy Chairman of Lazard Asset Management responsible for oversight of the firm's international and global strategies. He is also a Portfolio Manager/Analyst on the Global Equity and International Equity portfolio teams. He began working in the investment field in 1981. Prior to joining Lazard in 1992, John was Executive Vice President with General Electric Investment Corporation and Trustee of the General Electric Pension Trust.

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM