简体   繁体   English

从特定 web 页面打印表值

[英]Printing table values from specific web page

I want to extract and print all the entries for a specific month from the table我想从表中提取并打印特定月份的所有条目

import os
from webdriver_manager.chrome import ChromeDriverManager
import time

from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--start-maximized')
options.page_load_strategy = 'eager'

driver = webdriver.Chrome(options=options)

wait = WebDriverWait(driver, 20)   
driver.get("https://www.sebi.gov.in/sebiweb/home/HomeAction.do?doListing=yes&sid=3&ssid=22&smid=18")

month = "Apr"
year = "2021"

How to print all the values from the table which matches specific month and year?如何打印表中与特定月份和年份匹配的所有值?

You can try something like this:你可以尝试这样的事情:

driver = webdriver.Chrome()
driver.get('https://www.sebi.gov.in/sebiweb/home/HomeAction.do?doListing=yes&sid=3&ssid=22&smid=18')

month = "Apr"
year = "2021"

for row in driver.find_elements_by_xpath("//table/tbody/tr/td[1]"):
    if month in row.text and year in row.text:
        x = row.find_element_by_xpath("./following-sibling::td")
        print(row.text, " ", x.text)

Prints:印刷:

Apr 29, 2021   Rane Brake Lining Ltd. - Post Buyback Public Announcement
Apr 06, 2021   Insecticides (India) Limited - Public Announcement
Apr 06, 2021   Jagran Prakashan Limited - Filing of Public Announcement
Apr 05, 2021   Sreeleathers Limited - Post Buyback Public Announcement

Of course, that only gets the results on the first page, you would need to incorporate pagination if you wanted more than that.当然,这只会在第一页上得到结果,如果您想要更多,则需要合并分页。

First of all set the date range in filters.首先在过滤器中设置日期范围。 Then get the page source using data = driver.page_source然后使用data = driver.page_source获取页面源

Next, use bs4 to parse your data, soup = BeautifulSoup(data) Next loop through for row in soup.select('div.table-scrollable tbody tr') , date = row.select('td')[0] and title = row.select('td')[1]接下来,使用bs4解析您的数据, soup = BeautifulSoup(data) Next loop through for row in soup.select('div.table-scrollable tbody tr')date = row.select('td')[0]title = row.select('td')[1]

Happy coding.快乐编码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM