简体   繁体   English

下一页 python beautifulsoup

[英]Next page with python beautifulsoup

Am new to Python and stuck with the 'next page' logic.我是 Python 的新手,并且坚持使用“下一页”逻辑。

I tried while loop & selenium with chrome nothing worked.我尝试了 while loop & selenium 与 chrome 没有任何效果。

Please shed some light in this -请对此有所了解-

import requests
from bs4 import BeautifulSoup
import csv 

pages = [ 0 , 25 , 50 , 75]
for page in pages:
    source = requests.get('https://finance.yahoo.com/screener/predefined/day_gainers?count=25&offset={}'.format(page)).text

soup = BeautifulSoup(source , 'lxml') 

for link in soup.find_all("a"):
    table = soup.find("table",{"class":"W(100%)"})
    thead = table.find("thead").find_all("th")
    table_head = [th.text for th in thead]
    #print(table_head)

    table_body = table.find ("tbody").find_all("tr")
        
with open("report.csv" , "a" , newline="") as csv_file:
        csv_write = csv.writer(csv_file)
        csv_write.writerow(table_head)
        
        for tr in table_body:
            table_data = [td.text.strip() for td in tr.find_all('td') ]
            csv_write.writerow(table_data)

I think need to indent your code and its work fine.我认为需要缩进你的代码并且它的工作正常。 Here is code:这是代码:

import requests
from bs4 import BeautifulSoup
import csv

pages = [ 0 , 25 , 50 , 75]
for page in pages:
    
    source = requests.get('https://finance.yahoo.com/screener/predefined/day_gainers?count=25&offset={}'.format(page)).text
    

    soup = BeautifulSoup(source , 'lxml')


    for link in soup.find_all("a"):
        table = soup.find("table",{"class":"W(100%)"})
        thead = table.find("thead").find_all("th")
        table_head = [th.text for th in thead]
        #print(table_head)

        table_body = table.find ("tbody").find_all("tr")

        with open("report.csv" , "a" , newline="") as csv_file:
                csv_write = csv.writer(csv_file)
                csv_write.writerow(table_head)

                for tr in table_body:
                    table_data = [td.text.strip() for td in tr.find_all('td') ]
                    csv_write.writerow(table_data)

Edit For the second for loop getting duplicates value.编辑第二个 for 循环获取重复值。 So remove second for loop.所以删除第二个 for 循环。 Here is edited code.这是编辑后的代码。

import requests
from bs4 import BeautifulSoup
import csv

pages = [ 0,25,50,75 ]
for page in pages:
    source = requests.get('https://finance.yahoo.com/screener/predefined/day_gainers?count=25&offset={}'.format(page)).text
    soup = BeautifulSoup(source , 'lxml')

    table = soup.find("table",{"class":"W(100%)"})
    thead = table.find("thead").find_all("th")
    table_head = [th.text for th in thead]
    table_body = table.find ("tbody").find_all("tr")
    with open("report.csv" , "a" , newline="") as csv_file:
            csv_write = csv.writer(csv_file)
            csv_write.writerow(table_head)
            for tr in table_body:
                table_data = [td.text.strip() for td in tr.find_all('td') ]
                csv_write.writerow(table_data)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM