I need to download some stock data for university and get NaN Values for the dates. Can anyone help?
header = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"
}
def house_stock_trading():
url = 'https://www.quiverquant.com/sources/housetrading'
r = requests.get(url, headers=header)
df = pd.read_html(r.text)[0]
df.to_excel("data/house_stock_trading.xlsx" , index = False)
The table looks good, but I still get NaN Values for the dates. Any ideas?
Try to specify flavor="html5lib"
attribute in .read_html()
method:
import requests
import pandas as pd
header = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
"X-Requested-With": "XMLHttpRequest",
}
def house_stock_trading():
url = "https://www.quiverquant.com/sources/housetrading"
r = requests.get(url, headers=header)
df = pd.read_html(
r.text,
flavor="html5lib",
)[0]
return df
print(house_stock_trading().head())
Prints:
Stock * Date Disclosed Rep. Purchase / Sale Amount District
0 SQ 2/1/2022 Donald Sternoff Beyer Jr. Sale $1,001-$15,000 VA08
1 PEP 2/1/2022 Donald Sternoff Beyer Jr. Purchase $1,001-$15,000 VA08
2 PEP 2/1/2022 Donald Sternoff Beyer Jr. Purchase $1,001-$15,000 VA08
3 CYRX 2/1/2022 Donald Sternoff Beyer Jr. Sale $1,001-$15,000 VA08
4 BBH 2/1/2022 Donald Sternoff Beyer Jr. Sale $1,001-$15,000 VA08
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.