[英]Webpage values are missing while scraping data using BeautifulSoup python 3.6
我正在使用下面的脚本从http://fortune.com/fortune500/xcel-energy/中删除“STOCK QUOTE”数据,但它给出了空白。
我也使用过selenium驱动程序,但同样的问题。 请帮忙。
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
r = requests.get('http://fortune.com/fortune500/xcel-energy/')
soup = bs(r.content, 'lxml') # tried: 'html.parser
data = pd.DataFrame(columns=['C1','C2','C3','C4'], dtype='object', index=range(0,11))
for table in soup.find_all('div', {'class': 'stock-quote row'}):
row_marker = 0
for row in table.find_all('li'):
column_marker = 0
columns = row.find_all('span')
for column in columns:
data.iat[row_marker, column_marker] = column.get_text()
column_marker += 1
row_marker += 1
print(data)
输出结果:
C1 C2 C3 C4
0 Previous Close: NaN NaN
1 Market Cap: NaNB NaN B
2 Next Earnings Date: NaN NaN
3 High: NaN NaN
4 Low: NaN NaN
5 52 Week High: NaN NaN
6 52 Week Low: NaN NaN
7 52 Week Change %: 0.00 NaN NaN
8 P/E Ratio: n/a NaN NaN
9 EPS: NaN NaN
10 Dividend Yield: n/a NaN NaN
看起来您正在寻找的数据可在此API端点获得 :
import requests
response = requests.get("http://fortune.com/api/v2/company/xel/expand/1")
data = response.json()
print(data['ticker'])
仅供参考,在硒自动浏览器中打开页面时,您只需要确保在解析HTML ,工作代码之前等待所需的数据出现 :
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import pandas as pd
url = 'http://fortune.com/fortune500/xcel-energy/'
driver = webdriver.Chrome()
wait = WebDriverWait(driver, 10)
driver.get(url)
wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".stock-quote")))
page_source = driver.page_source
driver.close()
# HTML parsing part
soup = BeautifulSoup(page_source, 'lxml') # tried: 'html.parser
data = pd.DataFrame(columns=['C1','C2','C3','C4'], dtype='object', index=range(0,11))
for table in soup.find_all('div', {'class': 'stock-quote'}):
row_marker = 0
for row in table.find_all('li'):
column_marker = 0
columns = row.find_all('span')
for column in columns:
data.iat[row_marker, column_marker] = column.get_text()
column_marker += 1
row_marker += 1
print(data)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.