在Beautiful Soup中使用`.find_next_siblings`函數

Question

我正在嘗試將網絡抓取的輸出寫入CSV文件，這是我的代碼：

import bs4
import requests
import csv

#get webpage for Apple inc. September income statement
page = requests.get("https://au.finance.yahoo.com/q/is?s=AAPL")

#put into beautiful soup
soup = bs4.BeautifulSoup(page.content)

#select table that holds data of interest
table = soup.find("table", class_="yfnc_tabledata1")

#creates headers for table
headers = table.find('tr', class_="yfnc_modtitle1")

#creates generator that holds four values that are yearly revenues for company
total_revenue = headers.next_sibling
cost_of_revenue = total_revenue.next_sibling
gross_profit = cost_of_revenue.next_sibling.next_sibling
wang = headers.find_next_siblings("tr")

#iterates through generator from above and writes output to CSV file
with open('/home/kwal0203/Desktop/Apple.csv', 'a') as csvfile:
            writer = csv.writer(csvfile,delimiter="|")
            writer.writerow([value.get_text(strip=True).encode("utf-8") for value in headers])
            writer.writerow([value.get_text(strip=True).encode("utf-8") for value in total_revenue])
            writer.writerow([value.get_text(strip=True).encode("utf-8") for value in cost_of_revenue])
            writer.writerow([value.get_text(strip=True).encode("utf-8") for value in gross_profit])
            for dude in wang:
                writer.writerow([dude.get_text(strip=True).encode("utf-8")])

問題是在創建每一行並將其寫入CSV時，我重復了很多代碼。 如您所見，不斷重復next_sibling來獲取下一行值。 我在Beautiful Soup中找到了.find_next_siblings()函數，它幾乎可以實現我想要的功能，但是函數讀取的每一行都輸出到CSV文件的一個單元格中。

有任何想法嗎？ 如果問題不清楚，請告訴我。

謝謝。

Answer 1

好的，我想這不是一個完美的解決方案，但是這個想法是檢查下一個兄弟姐妹的數量，並跳過沒有以下內容的行：

next_rows = [[td.get_text(strip=True).encode("utf-8") for td in row('td')] 
             for row in headers.find_next_siblings("tr")]

pattern = re.compile(r'^[\d,]+$')
data = [[item for item in l if pattern.match(item)] for l in next_rows]
data = [l for l in data if l]

with open('/home/kwal0203/Desktop/Apple.csv', 'a') as csvfile:
    writer = csv.writer(csvfile, delimiter="|")
    writer.writerows(data)

生產：

42,123,000|37,432,000|45,646,000|57,594,000
26,114,000|22,697,000|27,699,000|35,748,000
16,009,000|14,735,000|17,947,000|21,846,000
1,686,000|1,603,000|1,422,000|1,330,000
3,158,000|2,850,000|2,932,000|3,053,000
11,165,000|10,282,000|13,593,000|17,463,000
307,000|202,000|225,000|246,000
11,472,000|10,484,000|13,818,000|17,709,000
11,472,000|10,484,000|13,818,000|17,709,000
3,005,000|2,736,000|3,595,000|4,637,000
8,467,000|7,748,000|10,223,000|13,072,000
8,467,000|7,748,000|10,223,000|13,072,000
8,467,000|7,748,000|10,223,000|13,072,000

這些基本上是表格中的所有金額。

在Beautiful Soup中使用`.find_next_siblings`函數

問題描述

1 個解決方案

解決方案1
0 已采納 2014-12-06 08:55:43

在Beautiful Soup中使用`.find_next_siblings`函數

問題描述

1 個解決方案

解決方案1 0 已采納 2014-12-06 08:55:43

解決方案1
0 已采納 2014-12-06 08:55:43