简体   繁体   中英

Pandas returns NaN Values for downlaoded Dates (read_html)

I need to download some stock data for university and get NaN Values for the dates. Can anyone help?

header = {
      "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
      "X-Requested-With": "XMLHttpRequest"
    }  



 def house_stock_trading():
        url = 'https://www.quiverquant.com/sources/housetrading'
              
        r = requests.get(url, headers=header)       
        df = pd.read_html(r.text)[0]
        df.to_excel("data/house_stock_trading.xlsx" , index = False)

The table looks good, but I still get NaN Values for the dates. Any ideas?

Try to specify flavor="html5lib" attribute in .read_html() method:

import requests
import pandas as pd


header = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
    "X-Requested-With": "XMLHttpRequest",
}


def house_stock_trading():
    url = "https://www.quiverquant.com/sources/housetrading"
    r = requests.get(url, headers=header)
    df = pd.read_html(
        r.text,
        flavor="html5lib",
    )[0]
    return df


print(house_stock_trading().head())

Prints:

  Stock * Date Disclosed                       Rep. Purchase / Sale          Amount District
0      SQ       2/1/2022  Donald Sternoff Beyer Jr.            Sale  $1,001-$15,000     VA08
1     PEP       2/1/2022  Donald Sternoff Beyer Jr.        Purchase  $1,001-$15,000     VA08
2     PEP       2/1/2022  Donald Sternoff Beyer Jr.        Purchase  $1,001-$15,000     VA08
3    CYRX       2/1/2022  Donald Sternoff Beyer Jr.            Sale  $1,001-$15,000     VA08
4     BBH       2/1/2022  Donald Sternoff Beyer Jr.            Sale  $1,001-$15,000     VA08

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM