简体   繁体   English

如何使用 python 将 html 表导出到 csv 文件?

[英]How do I export html table to csv file using python?

I scraped a html table from yahoofinance website and tried to export the table to csv file.我从 yahoofinance 网站上抓取了一个 html 表,并尝试将表导出到 csv 文件。 However, it does not return the correct output in the csv file.但是,它不会在 csv 文件中返回正确的 output。 The printed output on my terminal appears to be just fine.我的终端上打印的 output 似乎很好。 What have I done wrong here?我在这里做错了什么?

import requests
from bs4 import BeautifulSoup
import csv
import pandas as pd

mystocks = ["XOM", "CVX", "COP", "EOG"]
stockdata = []

def getData(symbol): 
    headers = {"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0"}
    url = f"https://finance.yahoo.com/quote/{symbol}/key-statistics"
    soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
    print("Ticker - "+symbol)
    for t in soup.select("table"):
        for tr in t.select("tr:has(td)"):
            for sup in tr.select("sup"):
                sup.extract()
            stockdata = [td.get_text(strip=True) for td in tr.select("td")]
            if len(stockdata) == 2:
                print("{:<50} {}".format(*stockdata))

for item in mystocks:
    stockdata.append(getData(item))

    df = pd.DataFrame(stockdata)
    df.to_csv('file_name.csv')

You are printing, not returning the data.您正在打印,而不是返回数据。 If you want all the data in one table it is good to add a column with the symbol for which the row was originated.如果您想要一个表中的所有数据,最好添加一列,其中包含该行的起源符号。 You could use something like this你可以使用这样的东西

import requests
from bs4 import BeautifulSoup
import csv
import pandas as pd

mystocks = ["XOM", "CVX", "COP", "EOG"]
stockdata = []

def getData(symbol): 
    headers = {"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0"}
    url = f"https://finance.yahoo.com/quote/{symbol}/key-statistics"
    soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
    print("Ticker - "+symbol)
    for t in soup.select("table"):
        for tr in t.select("tr:has(td)"):
            for sup in tr.select("sup"):
                sup.extract()
            stockdata = [td.get_text(strip=True) for td in tr.select("td")]
            if len(stockdata) == 2:
                # add a column with the symbol to help affterwards
                yield [item] + stockdata

# this will concatenate the rows for all the symbols in mystocks
df = pd.DataFrame([r for item in mystocks for r in getData(item)])
df.to_csv('file_name.csv')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM