简体   繁体   English

用BeautifulSoup提取表数据

[英]Table data extraction with BeautifulSoup

I would like to get table data from the website below:我想从以下网站获取表格数据:

https://www.isyatirim.com.tr/tr-tr/analiz/hisse/Sayfalar/sirket-karti.aspx?hisse=ALCAR

My code gets table data but I am experiencing a problem appending to dataframe.我的代码获取表数据,但我遇到附加到 dataframe 的问题。

stock = 'ALCAR'
page = requests.get(f"https://www.isyatirim.com.tr/tr-tr/analiz/hisse/Sayfalar/sirket-karti.aspx?hisse={stock}")

soup = BeautifulSoup(page.content, 'html.parser')

table = soup.find('tbody', id="tbodyMTablo")
print(table)


for j in table.find_all('tr'):
    row_data = j.find_all('td')
    row = [i.text for i in row_data]

    #print(row)
    df = pd.DataFrame(row).transpose()
    df.to_csv('xxx.csv')
    print(df)

You are saving each row in a csv file overwriting it at each step in the for loop.您将每一行保存在 csv 文件中,并在 for 循环的每一步覆盖它。 Here's a solution that allows to first save all rows in a pandas dataframe and finally save the dataframe to csv.这是一个解决方案,允许首先保存 pandas dataframe 中的所有行,最后将 dataframe 保存到 csv。

import requests
from bs4 import BeautifulSoup
import pandas as pd

stock = 'ALCAR'
page = requests.get(f"https://www.isyatirim.com.tr/tr-tr/analiz/hisse/Sayfalar/sirket-karti.aspx?hisse={stock}")

soup = BeautifulSoup(page.content, 'html.parser')

table = soup.find('tbody', id="tbodyMTablo")

df = pd.DataFrame() # initialize dataframe
for j in table.find_all('tr'):
    row_data = j.find_all('td')
    row = [i.text for i in row_data]
    df = df.append(pd.DataFrame(data=[row])) # append row to dataframe

df.to_csv('xxx.csv')  # save dataframe to csv file
print(df)

Just in addition - Cause you are working with pandas you can simply use pandas.read_html to create your DataFrame .另外 - 因为你正在使用pandas你可以简单地使用pandas.read_html来创建你的DataFrame

import pandas as pd

stock = 'ALCAR'
df = pd.read_html(f'https://www.isyatirim.com.tr/tr-tr/analiz/hisse/Sayfalar/sirket-karti.aspx?hisse={stock}', attrs = {'class':'excelexport'})[0]

df.to_csv('filename.csv')

Or if you do not need headers and index :或者,如果您不需要headersindex

df.to_csv('filname.csv', header=None,index=None)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM