![](/img/trans.png)
[英]Organizing html data scraped using BeautifulSoup and Python
[英]Convert Html scraped data into read and Write CSV file using python BeautifulSoup
請找到以下鏈接:
http://www.bseindia.com/stock-share-price/stockreach_financials.aspx?scripcode=505200&expandable=0
我嘗試了以下方法:
from bs4 import BeautifulSoup as soup
import csv
from pandas import read_csv
import requests
file_path=r'C:\Users\PreciseT3\Desktop\EicherStockDetails.csv'
eicher_stock_url='http://www.bseindia.com/stock-share-price/stockreach_financials.aspx?scripcode=505200&expandable=0'
get_url=requests.get(eicher_stock_url)
target_table=soup(get_url.text,'lxml')
extracted_table_data=target_table.find('table',id='acr')
datasets=[]
col_names=[]
count=1
with open(file_path,'r+') as file:
writer=csv.writer(file)
col_names.append('Years')
for years_row in extracted_table_data.find('tr').find_all('td',class_='TTHeader'):
if not(years_row.get_text()=='(in Cr.)'):
print(years_row.get_text())
col_names.append(years_row.get_text())
writer.writerow(col_names)
with open(file_path,'r+') as file:
writer=csv.writer(file)
for row_headings in extracted_table_data.find('tr').find('td',class_='TTRow_left'):
col_names.append(row_headings)
for row_values in extracted_table_data.find('tr').find_all('td',class_='TTRow_right',text=lambda x:'6,188.03' in x or '3,031.22' in x or '1,702.47' in x or '1,049.26' in x or '670.95' in x):
col_names.append(row_values.get_text())
writer.writerow(col_names)
我的結果如下:
Years,2016,2014,2013,2012,2011,Revenue,"6,188.03","3,031.22","1,702.47","1,049.26",670.95
我的要求是:
而不是((在Cr。))列名,我需要將其更改為'year'
我需要對其進行探索,並希望將數據獲取為csv格式的文件(寫入csv支持的文件),並且我還想對行和列進行Transpose(T)
我需要從另一個HTML頁面添加額外的列(需要一些示例)
請幫助我。我不能走的更遠。 提前致謝。
我已經對該代碼進行了一些修改-但邏輯應該易於理解。 我已經使用Cr和Year作為此基礎分析數據的分割器,但是您也可以通過調整代碼的“ main_split”部分將其轉換為百萬/ qtr。
from bs4 import BeautifulSoup
import urllib2
import pandas as pd
url = 'http://www.bseindia.com/stock-share-price/stockreach_financials.aspx?scripcode=505200&expandable=0'
html = urllib2.urlopen(url).read()
soup = BeautifulSoup(html, "html.parser")
main = []
for tr in soup.findAll('tr'):
mainSub = []
for td in tr.findAll('td'):
mainSub += [td.text]
main += [mainSub]
splitter = []
for y in range(len(main)):
splitter += [any('--' in x for x in main[y])]
split_index = [x for x in range(len(splitter)) if splitter[x] == True]
main_split = main[(split_index[7]+2):(split_index[8]-2)]
main_zip = zip(*main_split)
DF = pd.DataFrame(main_zip,columns=[x.replace(' ', '_') for x in main_zip.pop(0)])
print DF
希望這可以幫助。 干杯。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.