简体   繁体   中英

Scraping Yahoo Finance with Python3

I'm a complete newbie in scraping and I'm trying to scrape https://fr.finance.yahoo.com and I can't figure out what I'm doing wrong.

My goal is to scrape the index name, current level and the change(both in value and in %) 索引

Here is the code I have used:

import urllib.request

from bs4 import BeautifulSoup


url = 'https://fr.finance.yahoo.com'

request = urllib.request.Request(url)

html = urllib.request.urlopen(request).read()

soup = BeautifulSoup(html,'html.parser')


main_table = soup.find("div",attrs={'data-reactid':'12'})
print(main_table)

links = main_table.find_all("li", class_=' D(ib) Bxz(bb) Bdc($seperatorColor) Mend(16px) BdEnd ')
print(links)

However, the print(links) comes out empty. Could someone please assist? Any help would be highly appreciated as I have been trying to figure this out for a few days now.

I think you need to fix your element selection.

For example the following code:

import urllib.request
from bs4 import BeautifulSoup
url = 'https://fr.finance.yahoo.com'

request = urllib.request.Request(url)
html = urllib.request.urlopen(request).read()

soup = BeautifulSoup(html,'html.parser')

main_table = soup.find(id="market-summary")

links = main_table.find_all("a")
for i in links:
    print(i.attrs["aria-label"])

Gives output text having index name, % change, change, and value:

CAC 40 a augmenté de 0,37 % ou 16,55 points pour atteindre 4 461,99 points
Euro Stoxx 50 a augmenté de 0,28 % ou 8,16 points pour atteindre 2 913,14 points
Dow Jones a diminué de -0,63 % ou -153,98 points pour atteindre 24 320,14 points
EUR/USD a diminué de -0,49 % ou -0,0054 points pour atteindre 1,0897 points
Gold future a augmenté de 0,88 % ou 15,10 points pour atteindre 1 737,00 points
 a augmenté de 1,46 % ou 121,30 points pour atteindre 8 402,74 points
CMC Crypto 200 a augmenté de 1,60 % ou 2,90 points pour atteindre 184,14 points
Pétrole WTI a diminué de -3,95 % ou -1,34 points pour atteindre 32,58 points
DAX a augmenté de 0,29 % ou 32,27 points pour atteindre 11 098,20 points
FTSE 100 a diminué de -0,39 % ou -23,18 points pour atteindre 5 992,07 points
Nasdaq  a diminué de -0,30 % ou -28,25 points pour atteindre 9 256,63 points
S&P 500 a diminué de -0,43 % ou -12,62 points pour atteindre 2 935,89 points
Nikkei 225 a diminué de -0,80 % ou -164,15 points pour atteindre 20 388,16 points
HANG SENG a diminué de -5,56 % ou -1 349,89 points pour atteindre 22 930,14 points
GBP/USD a diminué de -0,34 % ou -0,0041 points pour atteindre 1,2186 points

Although the better way to get all the fields is parse and process the relevant script tag, this is one of the ways you can get all them.

import requests
import pandas as pd 
from bs4 import BeautifulSoup

url = 'https://fr.finance.yahoo.com/'

r = requests.get(url,headers={"User-Agent":"Mozilla/5.0"})
soup = BeautifulSoup(r.text,'html.parser')

df = pd.DataFrame(columns=['Index Name','Current Level','Value','Percentage Change'])

for item in soup.select("[id='market-summary'] li"):
    index_name = item.select_one("a").contents[1]
    current_level = ''.join(item.select_one("a > span").text.split())
    value = ''.join(item.select_one("a")['aria-label'].split("ou")[1].split("points")[0].split())
    percentage_change = ''.join(item.select_one("a > span + span").text.split())
    df = df.append({'Index Name':index_name, 'Current Level':current_level,'Value':value,'Percentage Change':percentage_change}, ignore_index=True)

print(df)

Output are like:

        Index Name Current Level     Value Percentage Change
0           CAC 40       4444,56     -0,88            -0,02%
1    Euro Stoxx 50       2905,47      0,49            +0,02%
2        Dow Jones      24438,63    -35,49            -0,15%
3          EUR/USD        1,0906   -0,0044            -0,40%
4      Gold future       1734,10     12,20            +0,71%
5          BTC-EUR       8443,23    161,79            +1,95%
6   CMC Crypto 200        185,66      4,42            +2,44%
7      Pétrole WTI         33,28     -0,64            -1,89%
8              DAX      11073,87      7,94            +0,07%
9         FTSE 100       5993,28    -21,97            -0,37%
10         Nasdaq        9315,26     30,38            +0,33%
11         S&P 500       2951,75      3,24            +0,11%
12      Nikkei 225      20388,16   -164,15            -0,80%
13       HANG SENG      22930,14  -1349,89            -5,56%
14         GBP/USD        1,2177   -0,0051            -0,41%

Try following css selector to get all the links.

import urllib
from bs4 import BeautifulSoup


url = 'https://fr.finance.yahoo.com'
request = urllib.request.Request(url)
html = urllib.request.urlopen(request).read()
soup = BeautifulSoup(html,'html.parser')
links=[link['href'] for link in soup.select("ul#market-summary a")]
print(links)

Output :

['/quote/^FCHI?p=^FCHI', '/quote/^STOXX50E?p=^STOXX50E', '/quote/^DJI?p=^DJI', '/quote/EURUSD=X?p=EURUSD=X', '/quote/GC=F?p=GC=F', '/quote/BTC-EUR?p=BTC-EUR', '/quote/^CMC200?p=^CMC200', '/quote/CL=F?p=CL=F', '/quote/^GDAXI?p=^GDAXI', '/quote/^FTSE?p=^FTSE', '/quote/^IXIC?p=^IXIC', '/quote/^GSPC?p=^GSPC', '/quote/^N225?p=^N225', '/quote/^HSI?p=^HSI', '/quote/GBPUSD=X?p=GBPUSD=X']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM