简体   繁体   English

美丽的汤来抓取数据

[英]Beautiful Soup to scrape data

I'm trying to scrape the EPS Estimates, EPS Earnings History (1st and 3rd tables) using BeautifulSoup from yahoo finance into an existing csv file.我正在尝试使用来自 yahoo Finance 的 BeautifulSoup 将 EPS 估计值、EPS 收益历史记录(第 1 表和第 3 表)刮到现有的 csv 文件中。 https://uk.finance.yahoo.com/quote/MSFT/analysis?p=MSFT https://uk.finance.yahoo.com/quote/MSFT/analysis?p=MSFT

I have made a start but am struggling to be able to pull the exact data that I need, I am guessing I will need a for loop across the rows and td tags.我已经开始了,但正在努力提取我需要的确切数据,我猜我需要一个跨行和 td 标签的 for 循环。

url = 'https://uk.finance.yahoo.com/quote/' + index +'/analysis?p=' + index
response = get(url)
soup = BeautifulSoup(response.text, 'html.parser')

EP = soup.find('table', attrs={'class':"W(100%)"})
print(EP)

This appears be getting only the first table, but I am not sure how we write the loop to get the appropriate data.这似乎只得到第一个表,但我不确定我们如何编写循环来获取适当的数据。 Looking at the HTML it looks like both the first and third tables have the same class name, so I can't use that to just go to the appropriate table.查看 HTML 看起来第一个和第三个表都具有相同的 class 名称,因此我不能将其仅用于相应表的 go 名称。

Another idea I had, is searching for all tables on the page and putting them into a list.我的另一个想法是搜索页面上的所有表格并将它们放入列表中。 I could then select the correct index, but I'm not sure how I would do that in code.然后我可以 select 正确的索引,但我不确定我将如何在代码中做到这一点。

Replace soup.find with soup.find_all() .soup.find替换为soup.find_all() It returns a list of all the tables, which you can then iterate.它返回所有表的列表,然后您可以对其进行迭代。

EPs = soup.find_all('table', attrs={'class':"W(100%)"})
for EP in EPs:
    ...

Your first and third tables would be EPs[0] and EPs[2] if that is what you are looking for.如果您正在寻找的话,您的第一张和第三张桌子将是EPs[0]EPs[2]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM