![](/img/trans.png)
[英]Extracting Text that Beautiful Soup skips over using find_next _siblings /text not enclosed in tags
[英]Using the `.find_next_siblings` function in Beautiful Soup
我正在尝试将网络抓取的输出写入CSV文件,这是我的代码:
import bs4
import requests
import csv
#get webpage for Apple inc. September income statement
page = requests.get("https://au.finance.yahoo.com/q/is?s=AAPL")
#put into beautiful soup
soup = bs4.BeautifulSoup(page.content)
#select table that holds data of interest
table = soup.find("table", class_="yfnc_tabledata1")
#creates headers for table
headers = table.find('tr', class_="yfnc_modtitle1")
#creates generator that holds four values that are yearly revenues for company
total_revenue = headers.next_sibling
cost_of_revenue = total_revenue.next_sibling
gross_profit = cost_of_revenue.next_sibling.next_sibling
wang = headers.find_next_siblings("tr")
#iterates through generator from above and writes output to CSV file
with open('/home/kwal0203/Desktop/Apple.csv', 'a') as csvfile:
writer = csv.writer(csvfile,delimiter="|")
writer.writerow([value.get_text(strip=True).encode("utf-8") for value in headers])
writer.writerow([value.get_text(strip=True).encode("utf-8") for value in total_revenue])
writer.writerow([value.get_text(strip=True).encode("utf-8") for value in cost_of_revenue])
writer.writerow([value.get_text(strip=True).encode("utf-8") for value in gross_profit])
for dude in wang:
writer.writerow([dude.get_text(strip=True).encode("utf-8")])
问题是在创建每一行并将其写入CSV时,我重复了很多代码。 如您所见,不断重复next_sibling
来获取下一行值。 我在Beautiful Soup中找到了.find_next_siblings()
函数,它几乎可以实现我想要的功能,但是函数读取的每一行都输出到CSV文件的一个单元格中。
有任何想法吗? 如果问题不清楚,请告诉我。
谢谢。
好的,我想这不是一个完美的解决方案,但是这个想法是检查下一个兄弟姐妹的数量,并跳过没有以下内容的行:
next_rows = [[td.get_text(strip=True).encode("utf-8") for td in row('td')]
for row in headers.find_next_siblings("tr")]
pattern = re.compile(r'^[\d,]+$')
data = [[item for item in l if pattern.match(item)] for l in next_rows]
data = [l for l in data if l]
with open('/home/kwal0203/Desktop/Apple.csv', 'a') as csvfile:
writer = csv.writer(csvfile, delimiter="|")
writer.writerows(data)
生产:
42,123,000|37,432,000|45,646,000|57,594,000
26,114,000|22,697,000|27,699,000|35,748,000
16,009,000|14,735,000|17,947,000|21,846,000
1,686,000|1,603,000|1,422,000|1,330,000
3,158,000|2,850,000|2,932,000|3,053,000
11,165,000|10,282,000|13,593,000|17,463,000
307,000|202,000|225,000|246,000
11,472,000|10,484,000|13,818,000|17,709,000
11,472,000|10,484,000|13,818,000|17,709,000
3,005,000|2,736,000|3,595,000|4,637,000
8,467,000|7,748,000|10,223,000|13,072,000
8,467,000|7,748,000|10,223,000|13,072,000
8,467,000|7,748,000|10,223,000|13,072,000
这些基本上是表格中的所有金额。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.