简体   繁体   English

使用 pandas read_html() 时遇到问题:ValueError

[英]Trouble using pandas read_html() : ValueError

from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests

url = "https://finance.naver.com/item/sise_day.nhn?code=068270&page=1"
headers = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'}
res = requests.get(url, verify=True, headers=headers)


with urlopen(url) as doc:
    html = BeautifulSoup(res.text, 'lxml') 
    pgrr = html.find('td', class_='pgRR') 
    s = str(pgrr.a['href']).split('=')
    last_page = s[-1]


df = pd.DataFrame()
sise_url = 'http://finance.naver.com/item/sise_day.nhn?code=068270'


for page in range(1, int(last_page)+1): 
    page_url = '{}&page={}'.format(sise_url, page)  
    df = df.append(pd.read_html(page_url, encoding='euc-kr', header='0')[0])

df = df.dropna() # 값이 빠진 행을 제거한다.
print(df)

I'm having this Value error while crawling the Daily stock data in Naver Finance.我在 Naver Finance 中抓取每日股票数据时遇到此值错误。 I have no trouble getting the url but if i use the read_html() i have Value Error:Table not found issue from the line df = df.append(pd.read_html(page_url, encoding='euc-kr', header='0')[0]) .我很容易获得 url 但如果我使用 read_html() 我有Value Error:Table not found df = df.append(pd.read_html(page_url, encoding='euc-kr', header='0')[0]) Pls give some advice.请给一些建议。

I don't read Korean... however pd.read_html() was getting an error page.我不读韩文......但是pd.read_html()得到一个错误页面。 Resolved this by requests.get() with headers.通过带有标头的requests.get()解决了这个问题。 Then pass res.text to read_html()然后将res.text传递给read_html()

from bs4 import BeautifulSoup
from urllib.request import urlopen
import requests
import pandas as pd

url = "https://finance.naver.com/item/sise_day.nhn?code=068270&page=1"
headers = {'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36'}
res = requests.get(url, verify=True, headers=headers)


with urlopen(url) as doc:
    html = BeautifulSoup(res.text, 'lxml') 
    pgrr = html.find('td', class_='pgRR') 
    s = str(pgrr.a['href']).split('=')
    last_page = s[-1]


df = pd.DataFrame()
sise_url = 'http://finance.naver.com/item/sise_day.nhn?code=068270'

for page in range(1, int(last_page)+1): 
    page_url = '{}&page={}'.format(sise_url, page)  
    res = requests.get(page_url, verify=True, headers=headers)
    df = df.append(pd.read_html(res.text, encoding='euc-kr')[0])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM