[英]Getting specific table from web page with BeautifulSoup
I want to get the data from the 3rd table on http://www.dividend.com/dividend-stocks/ . 我想从http://www.dividend.com/dividend-stocks/上的第3个表中获取数据。 Here is the code and I need some help.
这是代码,我需要一些帮助。
import requests
from bs4 import BeautifulSoup
url = "http://www.dividend.com/dividend-stocks/"
r = requests.get(url)
soup = BeautifulSoup(r.content, "html5lib")
# Skip first two tables
tables = soup.find("table")
tables = tables.find_next("table")
tables = tables.find_next("table")
row = ''
for td in tables.find_all("td"):
if len(td.text.strip()) > 0:
row = row + td.text.strip().replace('\n', ' ') +','
# Handle last column in a row, remove extra comma and add new line
if td.get('data-th') == 'Pay Date':
row = row[:-1] + '\n'
print(row)
but code output is like this: 但是代码输出是这样的:
AAPL,Apple Inc.,1.76%,$143.39,$2.52,5/11,5/18
GE,General Electric,3.32%,$28.91,$0.96,6/15,7/25
XOM,Exxon Mobil,3.71%,$83.03,$3.08,5/10,6/9
CVX,Chevron Corp,4.01%,$107.72,$4.32,5/17,6/12
BP,BP PLC ADR,6.66%,$35.72,$2.38,5/10,6/23
What did I do wrong? 我做错了什么? Thanks for any help!
谢谢你的帮助!
You can use a selector to find a specific table: 您可以使用选择器来查找特定的表:
tables = soup.select("table:nth-of-type(3)")
I'm not sure why your results are in a different order than they appear on the web page. 我不确定您的结果为何与网页上显示的结果顺序不同。
Although @Barmar 's method seems cleaner, here is another alternative using soup.find_all
and saving to JSON (even though that wasn't in the description). 尽管@Barmar的方法看起来更干净,
soup.find_all
是使用soup.find_all
并保存为JSON的另一种方法(即使描述中未包含)。
import json
import requests
from bs4 import BeautifulSoup
url = 'http://www.dividend.com/dividend-stocks/'
r = requests.get(url)
r.raise_for_status()
soup = BeautifulSoup(r.content, 'lxml')
stocks = {}
# Skip first two tables and header row of target table
for tr in soup.find_all('table')[2].find_all('tr')[1:]:
(stock_symbol, company_name, _, dividend_yield, current_price,
annual_dividend, ex_dividend_date, pay_date) = [
td.text.strip() for td in tr.find_all('td')]
stocks[stock_symbol] = {
'company_name': company_name,
'dividend_yield': float(dividend_yield.rstrip('%')),
'current_price': float(current_price.lstrip('$')),
'annual_dividend': float(annual_dividend.lstrip('$')),
'ex_dividend_date': ex_dividend_date,
'pay_date': pay_date
}
with open('stocks.json', 'w') as f:
json.dump(stocks, f, indent=2)
Thanks @Barmar and @Delirious Lettuce for posting the solution and codes. 感谢@Barmar和@Delirious生菜发布解决方案和代码。 Regarding the order of the output, I realized every time I refresh the data, I saw a glimpse of the data in the order of the output like I pulled.
关于输出的顺序,我意识到每次刷新数据时,就像拉动输出一样,我看到的数据一览无余。 Then I see the sorted data.
然后我看到排序的数据。 Tried a few different ways, I was able to use the Selenium webdriver to pull the data like the web presented.
尝试了几种不同的方法,我能够使用Selenium Webdriver像显示的Web一样提取数据。 Thanks all.
谢谢大家
BPT,BP Prudhoe Bay Royalty Trust,21.12%,$20.80,$4.39,4/11,4/20
PER,Sandridge Permian Trust,18.06%,$2.88,$0.52,5/10,5/26
CHKR,Chesapeake Granite Wash Trust,16.75%,$2.40,$0.40,5/18,6/1
NAT,Nordic American Tankers,13.33%,$6.00,$0.80,5/18,6/8
WIN,Windstream Corp,13.22%,$4.54,$0.60,6/28,7/17
NYMT,New York Mortgage Trust Inc,12.14%,$6.59,$0.80,6/22,7/25
IEP,Icahn Enterprises L.P.,11.65%,$51.50,$6.00,5/11,6/14
FTR,Frontier Communications,11.51%,$1.39,$0.16,6/13,6/30
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.