繁体   English   中英

在Beautiful Soup中使用`.find_next_siblings`函数

[英]Using the `.find_next_siblings` function in Beautiful Soup

我正在尝试将网络抓取的输出写入CSV文件,这是我的代码:

import bs4
import requests
import csv

#get webpage for Apple inc. September income statement
page = requests.get("https://au.finance.yahoo.com/q/is?s=AAPL")

#put into beautiful soup
soup = bs4.BeautifulSoup(page.content)

#select table that holds data of interest
table = soup.find("table", class_="yfnc_tabledata1")

#creates headers for table
headers = table.find('tr', class_="yfnc_modtitle1")

#creates generator that holds four values that are yearly revenues for company
total_revenue = headers.next_sibling
cost_of_revenue = total_revenue.next_sibling
gross_profit = cost_of_revenue.next_sibling.next_sibling
wang = headers.find_next_siblings("tr")

#iterates through generator from above and writes output to CSV file
with open('/home/kwal0203/Desktop/Apple.csv', 'a') as csvfile:
            writer = csv.writer(csvfile,delimiter="|")
            writer.writerow([value.get_text(strip=True).encode("utf-8") for value in headers])
            writer.writerow([value.get_text(strip=True).encode("utf-8") for value in total_revenue])
            writer.writerow([value.get_text(strip=True).encode("utf-8") for value in cost_of_revenue])
            writer.writerow([value.get_text(strip=True).encode("utf-8") for value in gross_profit])
            for dude in wang:
                writer.writerow([dude.get_text(strip=True).encode("utf-8")])

问题是在创建每一行并将其写入CSV时,我重复了很多代码。 如您所见,不断重复next_sibling来获取下一行值。 我在Beautiful Soup中找到了.find_next_siblings()函数,它几乎可以实现我想要的功能,但是函数读取的每一行都输出到CSV文件的一个单元格中。

有任何想法吗? 如果问题不清楚,请告诉我。

谢谢。

好的,我想这不是一个完美的解决方案,但是这个想法是检查下一个兄弟姐妹的数量,并跳过没有以下内容的行:

next_rows = [[td.get_text(strip=True).encode("utf-8") for td in row('td')] 
             for row in headers.find_next_siblings("tr")]

pattern = re.compile(r'^[\d,]+$')
data = [[item for item in l if pattern.match(item)] for l in next_rows]
data = [l for l in data if l]

with open('/home/kwal0203/Desktop/Apple.csv', 'a') as csvfile:
    writer = csv.writer(csvfile, delimiter="|")
    writer.writerows(data)

生产:

42,123,000|37,432,000|45,646,000|57,594,000
26,114,000|22,697,000|27,699,000|35,748,000
16,009,000|14,735,000|17,947,000|21,846,000
1,686,000|1,603,000|1,422,000|1,330,000
3,158,000|2,850,000|2,932,000|3,053,000
11,165,000|10,282,000|13,593,000|17,463,000
307,000|202,000|225,000|246,000
11,472,000|10,484,000|13,818,000|17,709,000
11,472,000|10,484,000|13,818,000|17,709,000
3,005,000|2,736,000|3,595,000|4,637,000
8,467,000|7,748,000|10,223,000|13,072,000
8,467,000|7,748,000|10,223,000|13,072,000
8,467,000|7,748,000|10,223,000|13,072,000

这些基本上是表格中的所有金额。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM