简体   繁体   中英

BeautifulSoup search within results

I'm trying to scrape Yahoo finance for annual performance of mutual funds. Their page is set up so the data I want is on a line with the same class as many other lines. There are no unique identifiers. I can index to the line I want but using different stock tickers results in the page changing which also changes the index I would need to use so that won't work.

I'm thinking I could search the page for some unique text, in this case "2010" then grab the "data-reactid="205"" number next to the "2010" value I found and then I could increment the "data-reactid" number to find the line of code I want. Hopefully this makes sense.

This is my test code so far:

import bs4
import requests
from bs4 import BeautifulSoup

url = requests.get('https://finance.yahoo.com/quote/APGAX/performance?p=APGAX')
soup = bs4.BeautifulSoup(url.text, features="html.parser")
ree = soup.find_all('span',attrs={"class": "W(10%) D(b) Fl(start) Ta(s)"})
print(ree)

Running that code results in about 30 different lines from the page (I tried to paste them here but this website changes the lines I paste so I can't show you that.

The "2010" I want to search for is about 1/3rd of the way down the list and the "data-reactid="205"" number from that same line. The problem is I don't know how to search within the results to find the particular line I want.

Anyone have any ideas on how to accomplish this? Thanks for the help. Sorry my description is not good. I'm pretty new at this but trying to learn Python.

Instead of parsing the web page, I recommend you use the Yahoo Finance API. There are Python libraries for accessing the API .

I hope this is what you are looking for, but please describe the problem clearly.

Instead of binding the url object with "text" bind it with "content"

url = requests.get('https://finance.yahoo.com/quote/APGAX/performance?p=APGAX')
soup = bs4.BeautifulSoup(url.content, features="html.parser")

to search within particular line, its good idea to first inspect the html tags to know which tag has the content you want including the correct class name or id name. For example, , the code will look like this..

results = soup.find_all('span', class_='W(10%) D(b) Fl(start) Ta(s)')
print(results)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM