簡體   English   中英

使用dataframe和xlsxwriter內部的for循環將整個Beautifulsoup數組保存到excel中

[英]Saving whole Beautifulsoup array into excel using dataframe and xlsxwriter inside for loop

在瀏覽了許多文檔並尋找stackoverflow的答案之后,我只是找不到解決我問題的方法。

基本上,我正在使用beautifulsoup從網站上抓取數據列表,然后將其存儲到excel中。 刮擦效果很好。

當我運行腳本時,它將把所有項目打印到終端。 但是,當我嘗試將結果保存到數據框並將其保存到Excel時,它將僅執行最后一行並將該行保存為excel。

我試過將代碼存儲在循環內,但結果相同。 我試過將列表轉換回for循環內的數組,但同樣的問題。 仍然只將最后一行保存到Excel中

我認為我在這里缺少合乎邏輯的方法。 如果有人可以鏈接我要尋找的內容,我將不勝感激。

        soup = BeautifulSoup(html, features="lxml")
        soup.find_all("div", {"id":"tbl-lock"})

        for listing in soup.find_all('tr'):

            listing.attrs = {}

            assetTime = listing.find_all("td", {"class": "locked"})
            assetCell = listing.find_all("td", {"class": "assetCell"})
            assetValue = listing.find_all("td", {"class": "assetValue"})

            for data in assetCell:

                array = [data.get_text()]

                ### Excel Heading + data
                df = pd.DataFrame({'Cell': array
                                    })
               print(array)
                # In here it will print all of the data


        ### Now we need to save the data to excel
        ### Create a Pandas Excel writer using XlsxWriter as the Engine
        writer = pd.ExcelWriter(filename+'.xlsx', engine='xlsxwriter')

        ### Convert the dataframe to an XlsxWriter Excel object and skip first row for custom header
        df.to_excel(writer, sheet_name='SheetName', startrow=1, header=False)

        ### Get the xlsxwritert workbook and worksheet objects

        workbook = writer.book
        worksheet = writer.sheets['SheetName']

        ### Custom header for Excel
        header_format = workbook.add_format({
            'bold': True,
            'text_wrap': True,
            'valign': 'top',
            'fg_color': '#D7E4BC',
            'border': 1
        })

        ### Write the column headers with the defined add_format
        print(df) ### In here it will print only 1 line
        for col_num, value in enumerate(df):

            worksheet.write(0, col_num +1, value, header_format)

            ### Close Pandas Excel writer and output the Excel file
            writer.save()

這行是問題df = pd.DataFrame({'Cell': array})在這里,您將覆蓋df,因此僅存儲最后一行。

而是將df初始化為df = pd.DataFrame(columns=['cell'])然后在循環中執行此操作

df = df.append(pd.DataFrame({'Cell': array}),ignore_index=True)

編輯:

嘗試這個

soup = BeautifulSoup(html, features="lxml")
soup.find_all("div", {"id":"tbl-lock"})

df = pd.DataFrame(columns=['cell'])
for listing in soup.find_all('tr'):

        listing.attrs = {}

        assetTime = listing.find_all("td", {"class": "locked"})
        assetCell = listing.find_all("td", {"class": "assetCell"})
        assetValue = listing.find_all("td", {"class": "assetValue"})

        for data in assetCell:

            array = [data.get_text()]

            ### Excel Heading + data
            df = df.append(pd.DataFrame({'Cell': array}),ignore_index=True)
            ##Or this
            #df = df.append(pd.DataFrame({'Cell': array}))   

            print(array)
            # In here it will print all of the data

其余代碼

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM