[英]Get specific value BeautifulSoup (parsing)
我正在嘗試從網站中提取信息。
使用 Python ( BeautifulSoup )
我想提取以下數據(只是數字)
每股收益(基本)
來自: https://www.marketwatch.com/investing/stock/aapl/financials/income/quarter
從xml :
我構建了代碼:
import pandas as pd
from bs4 import BeautifulSoup
import urllib.request as ur
import request
url_is = 'https://www.marketwatch.com/investing/stock/aapl/financials/income/quarter'
read_data = ur.urlopen(url_is).read()
soup_is=BeautifulSoup(read_data, 'lxml')
cells = soup_is.findAll('tr', {'class': 'mainRow'} )
for cell in cells:
print(cell.text)
但我不會提取EPS 的數字(基本)
有沒有辦法只提取數據並按列排序?
嘗試按照css
選擇器檢查 td 標簽是否包含EPS (Basic)
文本。
import urllib.request as ur
url_is = 'https://www.marketwatch.com/investing/stock/aapl/financials/income/quarter'
read_data = ur.urlopen(url_is).read()
soup_is=BeautifulSoup(read_data, 'lxml')
row = soup_is.select_one('tr.mainRow>td.rowTitle:contains("EPS (Basic)")')
print([cell.text for cell in row.parent.select('td') if cell.text!=''])
Output :
[' EPS (Basic)', '2.47', '2.20', '3.05', '5.04', '2.58']
在 DF 中打印
import pandas as pd
from bs4 import BeautifulSoup
import urllib.request as ur
url_is = 'https://www.marketwatch.com/investing/stock/aapl/financials/income/quarter'
read_data = ur.urlopen(url_is).read()
soup_is=BeautifulSoup(read_data, 'lxml')
row = soup_is.select_one('tr.mainRow>td.rowTitle:contains("EPS (Basic)")')
data=[cell.text for cell in row.parent.select('td') if cell.text!='']
df=pd.DataFrame(data)
print(df.T)
Output :
0 1 2 3 4 5
0 EPS (Basic) 2.47 2.20 3.05 5.04 2.58
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.