來自網絡抓取多個頁面的最終 dataframe

Question

I would like to create a pandas dataframe that includes all rows fulfilling the condition(and I managed to do it )scraped from a multiple page website.But the final result is that I am getting the pandas dataframe that has only the rows which belong to我在循環中聲明的范圍的最后一頁。 如果有人指出錯誤出在哪里，而不是所有頁面的結果，我將非常感激，只有我得到的最后一個。

import requests
import pandas
from bs4 import BeautifulSoup

headers= {'User-Agent': 'Mozilla/5.0'}


for num in range (1,3):
    url =' https://biznes.interia.pl/gieldy/notowania-gpw/profil-akcji-grn,wId,7380,tab,przebieg-sesji,pack,{}'.format(num)
     

    response = requests.get(url,headers=headers)
    content = response.content
    soup = BeautifulSoup(content,"html.parser")

    notow = soup.find_all('table',class_ = 'business-table-trading-table')
    #on a given page, select only the rows containing the word "Transakcja" 
    rows = notow[0].select('tr:has(td:contains("TRANSAKCJA"))')
     
    data = []
    
    for row in rows :
        cols = row.find_all('td')
         
        cols = [ele.text.strip() for ele in cols]
         
        cols = data.append([ele for ele in cols if ele] )
        
         
 #final dataframe which should have  contained  the result from  all scraped pages        
        
df = pandas.DataFrame(data,)      
                      
print(df)

Answer 1

將代碼data = []放在循環之外。

提取到列表data中的項目現在在最后一次循環迭代中重新初始化為空列表，有效地擦除在前 2 次循環迭代中提取的所有項。

通常，避免在循環內初始化變量，除非您只在循環內使用變量。

來自網絡抓取多個頁面的最終 dataframe

問題描述

1 個解決方案

解決方案1
3 已采納 2021-02-17 21:16:24

來自網絡抓取多個頁面的最終 dataframe

問題描述

1 個解決方案

解決方案1 3 已采納 2021-02-17 21:16:24

解決方案1
3 已采納 2021-02-17 21:16:24