简体   繁体   中英

Scrape data in python from yahoo finance

I want to scrape data from yahoo finance for a specific symbol.

I can scrape the table format but I can't scrape non-table format. I applied the same principle to scrape information in the same page, but no result.

So far I could scrape from https://finance.yahoo.com/quote/AAPL/profile?p=AAPL

The code I use to scrape table is:

import numpy as np
import pandas as pd

import requests
import lxml
from lxml import html

symbol = 'AAPL'

url = 'https://finance.yahoo.com/quote/' + symbol + '/profile?p=' + symbol

page = requests.get(url)
tree = html.fromstring(page.content)

table = tree.xpath('//table') 

assert len(table) == 1 
tstring = lxml.etree.tostring(table[0], method='html')
df = pd.read_html(tstring)[0]

df

I want to scrape the table on the right

Sector: Consumer Goods
Industry: Electronic Equipment
Full Time Employees: 137,000

I would appreciate if you could help to get the information or give some tips and advice.

You can use following-sibling

import requests
from lxml import html

xp = "//span[text()='Sector']/following-sibling::span[1]"

symbol = 'AAPL'

url = 'https://finance.yahoo.com/quote/' + symbol + '/profile?p=' + symbol

page = requests.get(url)
tree = html.fromstring(page.content)

d = {}
for label in ['Sector', 'Industry', 'Full Time Employees']:
    xp = f"//span[text()='{label}']/following-sibling::span[1]"
    s = tree.xpath(xp)[0]
    d[label] = s.text_content()


print(d['Full Time Employees'])
print(d['Industry'])
print(d['Sector'])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM