简体   繁体   English

使用python BeautifulSoup将HTML抓取的数据转换为读写CSV文件

[英]Convert Html scraped data into read and Write CSV file using python BeautifulSoup

Kindly find the following link: 请找到以下链接:

http://www.bseindia.com/stock-share-price/stockreach_financials.aspx?scripcode=505200&expandable=0 http://www.bseindia.com/stock-share-price/stockreach_financials.aspx?scripcode=505200&expandable=0

I have tried the following: 我尝试了以下方法:

from bs4 import BeautifulSoup as soup
import csv
from pandas import read_csv
import requests

file_path=r'C:\Users\PreciseT3\Desktop\EicherStockDetails.csv'
eicher_stock_url='http://www.bseindia.com/stock-share-price/stockreach_financials.aspx?scripcode=505200&expandable=0'
get_url=requests.get(eicher_stock_url)
target_table=soup(get_url.text,'lxml')
extracted_table_data=target_table.find('table',id='acr')
datasets=[]
col_names=[]
count=1

with open(file_path,'r+') as file:
 writer=csv.writer(file)
 col_names.append('Years')
 for years_row in extracted_table_data.find('tr').find_all('td',class_='TTHeader'):    
    if not(years_row.get_text()=='(in Cr.)'):
        print(years_row.get_text())
        col_names.append(years_row.get_text()) 

 writer.writerow(col_names)

with open(file_path,'r+') as file:
 writer=csv.writer(file)
 for row_headings in extracted_table_data.find('tr').find('td',class_='TTRow_left'):
  col_names.append(row_headings)
  for row_values in extracted_table_data.find('tr').find_all('td',class_='TTRow_right',text=lambda x:'6,188.03' in x or '3,031.22' in x or '1,702.47' in x or '1,049.26' in x or '670.95' in x):
   col_names.append(row_values.get_text())

 writer.writerow(col_names)

and my result is as follows: 我的结果如下:

Years,2016,2014,2013,2012,2011,Revenue,"6,188.03","3,031.22","1,702.47","1,049.26",670.95

My Requirements are: 我的要求是:

  • Instead of ((in Cr.)) column name i need to change that as 'year' 而不是((在Cr。))列名,我需要将其更改为'year'

  • I need to explore it and want to get the data as csv formatted file(write into csv supported file) and also i want to Transpose(T) the rows and columns 我需要对其进行探索,并希望将数据获取为csv格式的文件(写入csv支持的文件),并且我还想对行和列进行Transpose(T)

  • I need to add an extra columns from another html page(Require some examples) 我需要从另一个HTML页面添加额外的列(需要一些示例)

Please kindly help me.I cannot go further. 请帮助我。我不能走的更远。 Thanks in advance. 提前致谢。

I've modified this code a bit - but the logic should be straightforward to follow. 我已经对该代码进行了一些修改-但逻辑应该易于理解。 I've used Cr and Year as the splitter for this fundamental analysis data but you can convert that to millions/qtr as well by adjusting the "main_split" part of the code. 我已经使用Cr和Year作为此基础分析数据的分割器,但是您也可以通过调整代码的“ main_split”部分将其转换为百万/ qtr。

from bs4 import BeautifulSoup
import urllib2
import pandas as pd

url = 'http://www.bseindia.com/stock-share-price/stockreach_financials.aspx?scripcode=505200&expandable=0'
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html, "html.parser")

main = []
for tr in soup.findAll('tr'):
    mainSub = []
    for td in tr.findAll('td'):
        mainSub += [td.text]
    main += [mainSub]

splitter = []
for y in range(len(main)):
    splitter += [any('--' in x for x in main[y])]

split_index = [x for x in range(len(splitter)) if splitter[x] == True]

main_split = main[(split_index[7]+2):(split_index[8]-2)]


main_zip = zip(*main_split)
DF = pd.DataFrame(main_zip,columns=[x.replace(' ', '_') for x in main_zip.pop(0)])
print DF

Hope this helps. 希望这可以帮助。 Cheers. 干杯。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM