This code is executing and providing multiple links to data from a single website . Code mentions the website . Website has data from multiple links which then tabulates as one single table
Can you suggest what are the changes to made in this code in order to get data without importing any further libraries and tabulate it?
#import libraries
import re
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import urllib.request as ur
from bs4 import BeautifulSoup
s = ur.urlopen("https://financials.morningstar.com/ratios/r.html?t=AAPL")
s1 = s.read()
print(s1)
soup = BeautifulSoup(ur.urlopen('https://financials.morningstar.com/ratios/r.html?t=AAPL'),"html.parser")
title = soup.title
print(title)
text = soup.get_text()
print(text)
links = []
for link in soup.find_all(attrs={'href': re.compile("http")}):
links.append(link.get('href'))
print(links)
The expected results should be a tabular form of ratios as listed each of which can be listed as dictionary with key being the year and value being the ratio
1) Here is one way with selenium and pandas. You can view the final structure here . The content is JavaScript loaded so I think it likely you need additional libraries.
2) There was a call being made to this:
that returns json containing info for the page. You might try using requests
with that.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
import copy
d = webdriver.Chrome()
d.get('https://financials.morningstar.com/ratios/r.html?t=AAPL')
tables = WebDriverWait(d,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "#tab-profitability table")))
results = []
for table in tables:
t = pd.read_html(table.get_attribute('outerHTML'))[0].dropna()
years = t.columns[1:]
for row in t.itertuples(index=True, name='Pandas'):
record = {row[1] : dict(zip(years, row[2:]))}
results.append(copy.deepcopy(record))
print(results)
d.quit()
You end up with all 17 rows being listed. First two rows shown here with row 2 expanded to show pairing of years with values.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.