[英]Saving whole Beautifulsoup array into excel using dataframe and xlsxwriter inside for loop
在瀏覽了許多文檔並尋找stackoverflow的答案之后,我只是找不到解決我問題的方法。
基本上,我正在使用beautifulsoup從網站上抓取數據列表,然后將其存儲到excel中。 刮擦效果很好。
當我運行腳本時,它將把所有項目打印到終端。 但是,當我嘗試將結果保存到數據框並將其保存到Excel時,它將僅執行最后一行並將該行保存為excel。
我試過將代碼存儲在循環內,但結果相同。 我試過將列表轉換回for循環內的數組,但同樣的問題。 仍然只將最后一行保存到Excel中
我認為我在這里缺少合乎邏輯的方法。 如果有人可以鏈接我要尋找的內容,我將不勝感激。
soup = BeautifulSoup(html, features="lxml")
soup.find_all("div", {"id":"tbl-lock"})
for listing in soup.find_all('tr'):
listing.attrs = {}
assetTime = listing.find_all("td", {"class": "locked"})
assetCell = listing.find_all("td", {"class": "assetCell"})
assetValue = listing.find_all("td", {"class": "assetValue"})
for data in assetCell:
array = [data.get_text()]
### Excel Heading + data
df = pd.DataFrame({'Cell': array
})
print(array)
# In here it will print all of the data
### Now we need to save the data to excel
### Create a Pandas Excel writer using XlsxWriter as the Engine
writer = pd.ExcelWriter(filename+'.xlsx', engine='xlsxwriter')
### Convert the dataframe to an XlsxWriter Excel object and skip first row for custom header
df.to_excel(writer, sheet_name='SheetName', startrow=1, header=False)
### Get the xlsxwritert workbook and worksheet objects
workbook = writer.book
worksheet = writer.sheets['SheetName']
### Custom header for Excel
header_format = workbook.add_format({
'bold': True,
'text_wrap': True,
'valign': 'top',
'fg_color': '#D7E4BC',
'border': 1
})
### Write the column headers with the defined add_format
print(df) ### In here it will print only 1 line
for col_num, value in enumerate(df):
worksheet.write(0, col_num +1, value, header_format)
### Close Pandas Excel writer and output the Excel file
writer.save()
這行是問題df = pd.DataFrame({'Cell': array})
在這里,您將覆蓋df,因此僅存儲最后一行。
而是將df初始化為df = pd.DataFrame(columns=['cell'])
然后在循環中執行此操作
df = df.append(pd.DataFrame({'Cell': array}),ignore_index=True)
編輯:
嘗試這個
soup = BeautifulSoup(html, features="lxml")
soup.find_all("div", {"id":"tbl-lock"})
df = pd.DataFrame(columns=['cell'])
for listing in soup.find_all('tr'):
listing.attrs = {}
assetTime = listing.find_all("td", {"class": "locked"})
assetCell = listing.find_all("td", {"class": "assetCell"})
assetValue = listing.find_all("td", {"class": "assetValue"})
for data in assetCell:
array = [data.get_text()]
### Excel Heading + data
df = df.append(pd.DataFrame({'Cell': array}),ignore_index=True)
##Or this
#df = df.append(pd.DataFrame({'Cell': array}))
print(array)
# In here it will print all of the data
。 。 。 。 其余代碼
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.