简体   繁体   中英

Python List[ ] Data to Excel Sheet

I have scraped data from HTML table into the python list but i need to put python List[] data into the Excel Sheet, i can't find a way around, can any one help please. Python list is dynamic its size can change depending upon the table size.

I am working with openpyxl, but can't find a way to add python List[] data into the Excel Sheet. Code output is given below i need to put this List data into excel sheet row by row.

from bs4 import BeautifulSoup

html = """\
<html>
<head></head>
<body>
<section class="smartphone_Px(20px) smartphone_Mb(30px)" data-test="qsp-financial"
         data-yaft-module="tdv2-applet-Financials">
  <div class="Mt(18px) Mb(14px)">
    <div><span class="Mend(10px)"><span>Show</span><!-- react-text: 969 -->:<!-- /react-text --></span>
      <div class="D(ib)">
        <div class="Mend(10px) D(ib) C(black) Fw(b) Pend(10px) H(18px) selected BdEnd Bdc($c-fuji-grey-e)"><span>Income Statement</span>
        </div>
        <a class="Mend(10px) P(0px) M(0px) C($c-fuji-blue-1-b) C(black):h Bd(0px) O(n)"
           href="/quote/VER/balance-sheet?p=VER">
          <div class="Fw(500) D(ib) Pend(10px) H(18px) BdEnd Bdc($c-fuji-grey-e)"><span>Balance Sheet</span></div>
        </a><a class="Mend(10px) P(0px) M(0px) C($c-fuji-blue-1-b) C(black):h Bd(0px) O(n)"
               href="/quote/VER/cash-flow?p=VER">
        <div class="Fw(500) D(ib) Pend(10px) H(18px)"><span>Cash Flow</span></div>
      </a></div>
    </div>
    <div class="Fl(end) smartphone_Mt(4px)">
      <div class="Fz(s) Fw(500) D(ib) H(18px) C(black):h BdEnd Bdc($c-fuji-grey-e) C(black) Pend(15px) Mend(15px)">
        <span>Annual</span></div>
      <button class="P(0px) M(0px) C($c-fuji-blue-1-b) Bd(0px) O(n)">
        <div class="Fz(s) Fw(500) D(ib) H(18px) C(black):h C($c-fuji-blue-1-b)"><span>Quarterly</span></div>
      </button>
    </div>
  </div>
  <div class="Mb(11px)"><h3 class="D(ib) Fz(20px) Fw(b)"><span>Income Statement</span></h3><span
          class="Fz(xs) C($gray) Mstart(25px) smartphone_Mstart(0px) smartphone_D(b) smartphone_Mt(5px)"><span>All numbers in thousands</span></span>
  </div>
  <div class="Mt(10px) Ovx(a) W(100%)">
    <table class="Lh(1.7) W(100%) M(0)">
      <tbody>
      <tr class="Bdbw(1px) Bdbc($c-fuji-grey-c) Bdbs(s) H(36px)">
        <td class="Fw(b) Fz(15px)"><span>Revenue</span></td>
        <td class="C($gray) Ta(end)"><span>12/31/2018</span></td>
        <td class="C($gray) Ta(end)"><span>12/31/2017</span></td>
        <td class="C($gray) Ta(end)"><span>12/31/2016</span></td>
        <td class="C($gray) Ta(end)"><span>12/31/2015</span></td>
      </tr>
      <tr class="Bdbw(1px) Bdbc($c-fuji-grey-c) Bdbs(s) H(36px)">
        <td class="Fz(s) H(35px) Va(m)"><span>Total Revenue</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>1,259,036</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>1,253,148</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>1,335,030</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>1,443,527</span></td>
      </tr>
      <tr class="Bdbw(1px) Bdbc($c-fuji-grey-c) Bdbs(s) H(36px)">
        <td class="Fz(s) H(35px) Va(m)"><span>Cost of Revenue</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>126,461</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>128,717</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>144,428</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>146,155</span></td>
      </tr>
      <tr class="Bdbw(0px)! H(36px)">
        <td class="Fw(600) Fz(s) Pb(20px)"><span>Gross Profit</span></td>
        <td class="Fw(600) Fz(s) Ta(end) Pb(20px)"><span>1,132,575</span></td>
        <td class="Fw(600) Fz(s) Ta(end) Pb(20px)"><span>1,124,431</span></td>
        <td class="Fw(600) Fz(s) Ta(end) Pb(20px)"><span>1,190,602</span></td>
        <td class="Fw(600) Fz(s) Ta(end) Pb(20px)"><span>1,297,372</span></td>
      </tr>
      <tr class="Bdbw(1px) Bdbc($c-fuji-grey-c) Bdbs(s) H(36px)">
        <td class="Fw(b) Fz(15px) Pb(8px) Pt(36px)" colspan="5"><span>Operating Expenses</span></td>
      </tr>
      <tr class="Bdbw(1px) Bdbc($c-fuji-grey-c) Bdbs(s) H(36px)">
        <td class="Fz(s) H(35px) Va(m)"><span>Research Development</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)">-</td>
        <td class="Fz(s) Ta(end) Pstart(10px)">-</td>
        <td class="Fz(s) Ta(end) Pstart(10px)">-</td>
        <td class="Fz(s) Ta(end) Pstart(10px)">-</td>
      </tr>
      <tr class="Bdbw(1px) Bdbc($c-fuji-grey-c) Bdbs(s) H(36px)">
        <td class="Fz(s) H(35px) Va(m)"><span>Selling General and Administrative</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>63,933</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>58,603</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>51,927</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>67,137</span></td>
      </tr>
      <tr class="Bdbw(1px) Bdbc($c-fuji-grey-c) Bdbs(s) H(36px)">
        <td class="Fz(s) H(35px) Va(m)"><span>Non Recurring</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)">-</td>
        <td class="Fz(s) Ta(end) Pstart(10px)">-</td>
        <td class="Fz(s) Ta(end) Pstart(10px)">-</td>
        <td class="Fz(s) Ta(end) Pstart(10px)">-</td>
      </tr>
      <tr class="Bdbw(1px) Bdbc($c-fuji-grey-c) Bdbs(s) H(36px)">
        <td class="Fz(s) H(35px) Va(m)"><span>Others</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)">-</td>
        <td class="Fz(s) Ta(end) Pstart(10px)">-</td>
        <td class="Fz(s) Ta(end) Pstart(10px)">-</td>
        <td class="Fz(s) Ta(end) Pstart(10px)">-</td>
      </tr>
      <tr class="Bdbw(1px) Bdbc($c-fuji-grey-c) Bdbs(s) H(36px)">
        <td class="Fz(s) H(35px) Va(m)"><span>Total Operating Expenses</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>830,212</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>893,522</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>956,193</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>1,035,019</span></td>
      </tr>
      <tr class="Bdbw(0px)! H(36px)">
        <td class="Fw(600) Fz(s) Pb(20px)"><span>Operating Income or Loss</span></td>
        <td class="Fw(600) Fz(s) Ta(end) Pb(20px)"><span>428,824</span></td>
        <td class="Fw(600) Fz(s) Ta(end) Pb(20px)"><span>359,626</span></td>
        <td class="Fw(600) Fz(s) Ta(end) Pb(20px)"><span>378,837</span></td>
        <td class="Fw(600) Fz(s) Ta(end) Pb(20px)"><span>408,508</span></td>
      </tr>
      <tr class="Bdbw(1px) Bdbc($c-fuji-grey-c) Bdbs(s) H(36px)">
        <td class="Fw(b) Fz(15px) Pb(8px) Pt(36px)" colspan="5"><span>Income from Continuing Operations</span></td>
      </tr>
      <tr class="Bdbw(1px) Bdbc($c-fuji-grey-c) Bdbs(s) H(36px)">
        <td class="Fz(s) H(35px) Va(m)"><span>Total Other Income/Expenses Net</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>-515,448</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>-301,249</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>-448,588</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>-542,911</span></td>
      </tr>
      <tr class="Bdbw(1px) Bdbc($c-fuji-grey-c) Bdbs(s) H(36px)">
        <td class="Fz(s) H(35px) Va(m)"><span>Earnings Before Interest and Taxes</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>428,824</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>359,626</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>378,837</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>408,508</span></td>
      </tr>
      <tr class="Bdbw(1px) Bdbc($c-fuji-grey-c) Bdbs(s) H(36px)">
        <td class="Fz(s) H(35px) Va(m)"><span>Interest Expense</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>-280,887</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>-289,766</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>-317,376</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>-358,392</span></td>
      </tr>
      <tr class="Bdbw(1px) Bdbc($c-fuji-grey-c) Bdbs(s) H(36px)">
        <td class="Fz(s) H(35px) Va(m)"><span>Income Before Tax</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>-86,624</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>58,377</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>-69,751</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>-134,403</span></td>
      </tr>
      <tr class="Bdbw(1px) Bdbc($c-fuji-grey-c) Bdbs(s) H(36px)">
        <td class="Fz(s) H(35px) Va(m)"><span>Income Tax Expense</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>5,101</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>6,882</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>7,136</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>4,589</span></td>
      </tr>
      <tr class="Bdbw(1px) Bdbc($c-fuji-grey-c) Bdbs(s) H(36px)">
        <td class="Fz(s) H(35px) Va(m)"><span>Minority Interest</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>143,085</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>158,598</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>172,172</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>189,972</span></td>
      </tr>
      <tr class="Bdbw(0px)! H(36px)">
        <td class="Fw(600) Fz(s) Pb(20px)"><span>Net Income From Continuing Ops</span></td>
        <td class="Fw(600) Fz(s) Ta(end) Pb(20px)"><span>-91,725</span></td>
        <td class="Fw(600) Fz(s) Ta(end) Pb(20px)"><span>51,495</span></td>
        <td class="Fw(600) Fz(s) Ta(end) Pb(20px)"><span>-76,887</span></td>
        <td class="Fw(600) Fz(s) Ta(end) Pb(20px)"><span>-138,992</span></td>
      </tr>
      <tr class="Bdbw(1px) Bdbc($c-fuji-grey-c) Bdbs(s) H(36px)">
        <td class="Fw(b) Fz(15px) Pb(8px) Pt(36px)" colspan="5"><span>Non-recurring Events</span></td>
      </tr>
      <tr class="Bdbw(1px) Bdbc($c-fuji-grey-c) Bdbs(s) H(36px)">
        <td class="Fz(s) H(35px) Va(m)"><span>Discontinued Operations</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>3,695</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>-19,117</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>-123,937</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)"><span>-184,500</span></td>
      </tr>
      <tr class="Bdbw(1px) Bdbc($c-fuji-grey-c) Bdbs(s) H(36px)">
        <td class="Fz(s) H(35px) Va(m)"><span>Extraordinary Items</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)">-</td>
        <td class="Fz(s) Ta(end) Pstart(10px)">-</td>
        <td class="Fz(s) Ta(end) Pstart(10px)">-</td>
        <td class="Fz(s) Ta(end) Pstart(10px)">-</td>
      </tr>
      <tr class="Bdbw(1px) Bdbc($c-fuji-grey-c) Bdbs(s) H(36px)">
        <td class="Fz(s) H(35px) Va(m)"><span>Effect Of Accounting Changes</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)">-</td>
        <td class="Fz(s) Ta(end) Pstart(10px)">-</td>
        <td class="Fz(s) Ta(end) Pstart(10px)">-</td>
        <td class="Fz(s) Ta(end) Pstart(10px)">-</td>
      </tr>
      <tr class="Bdbw(0px)! H(36px)">
        <td class="Pb(20px)"><span>Other Items</span></td>
        <td class="Pb(20px)">-</td>
        <td class="Pb(20px)">-</td>
        <td class="Pb(20px)">-</td>
        <td class="Pb(20px)">-</td>
      </tr>
      <tr class="Bdbw(1px) Bdbc($c-fuji-grey-c) Bdbs(s) H(36px)">
        <td class="Fw(b) Fz(15px) Pb(8px) Pt(36px)" colspan="5"><span>Net Income</span></td>
      </tr>
      <tr class="Bdbw(1px) Bdbc($c-fuji-grey-c) Bdbs(s) H(36px)">
        <td class="Fw(600) Py(8px) Pt(36px)"><span>Net Income</span></td>
        <td class="Fw(600) Ta(end) Py(8px) Pt(36px)"><span>-85,774</span></td>
        <td class="Fw(600) Ta(end) Py(8px) Pt(36px)"><span>31,818</span></td>
        <td class="Fw(600) Ta(end) Py(8px) Pt(36px)"><span>-195,863</span></td>
        <td class="Fw(600) Ta(end) Py(8px) Pt(36px)"><span>-316,353</span></td>
      </tr>
      <tr class="Bdbw(1px) Bdbc($c-fuji-grey-c) Bdbs(s) H(36px)">
        <td class="Fz(s) H(35px) Va(m)"><span>Preferred Stock And Other Adjustments</span></td>
        <td class="Fz(s) Ta(end) Pstart(10px)">-</td>
        <td class="Fz(s) Ta(end) Pstart(10px)">-</td>
        <td class="Fz(s) Ta(end) Pstart(10px)">-</td>
        <td class="Fz(s) Ta(end) Pstart(10px)">-</td>
      </tr>
      <tr class="Bdbw(0px)! H(36px)">
        <td class="Fw(600) W(40%)"><span>Net Income Applicable To Common Shares</span></td>
        <td class="Fw(600) Ta(end)"><span>-157,708</span></td>
        <td class="Fw(600) Ta(end)"><span>-40,565</span></td>
        <td class="Fw(600) Ta(end)"><span>-268,247</span></td>
        <td class="Fw(600) Ta(end)"><span>-388,655</span></td>
      </tr>
      </tbody>
    </table>
  </div>
</section>
</body>
</html>"""

soup = BeautifulSoup(html, 'html5lib')
tables = soup.findAll('table')
tableE = []

for table in tables:
    rows = []
    for row in table.findAll('tr')[0:]:
        cells = []
        for cell in row.findAll('td'):
            text = cell.text
            cells.append(text)
        rows.append(cells)
    tableE.append(rows)
print(tableE)

It shows:

[[['Revenue', '12/31/2018', '12/31/2017', '12/31/2016', '12/31/2015'],
  ['Total Revenue', '1,259,036', '1,253,148', '1,335,030', '1,443,527'],
  ['Cost of Revenue', '126,461', '128,717', '144,428', '146,155'],
  ['Gross Profit', '1,132,575', '1,124,431', '1,190,602', '1,297,372'],
  ['Operating Expenses'],
  ['Research Development', '-', '-', '-', '-'],
  ['Selling General and Administrative', '63,933', '58,603', '51,927', '67,137'],
  ['Non Recurring', '-', '-', '-', '-'],
  ['Others', '-', '-', '-', '-'],
  ['Total Operating Expenses', '830,212', '893,522', '956,193', '1,035,019'],
  ['Operating Income or Loss', '428,824', '359,626', '378,837', '408,508'],
  ['Income from Continuing Operations'],
  ['Total Other Income/Expenses Net', '-515,448', '-301,249', '-448,588', '-542,911'],
  ['Earnings Before Interest and Taxes', '428,824', '359,626', '378,837', '408,508'],
  ['Interest Expense', '-280,887', '-289,766', '-317,376', '-358,392'],
  ['Income Before Tax', '-86,624', '58,377', '-69,751', '-134,403'],
  ['Income Tax Expense', '5,101', '6,882', '7,136', '4,589'],
  ['Minority Interest', '143,085', '158,598', '172,172', '189,972'],
  ['Net Income From Continuing Ops', '-91,725', '51,495', '-76,887', '-138,992'],
  ['Non-recurring Events'],
  ['Discontinued Operations', '3,695', '-19,117', '-123,937', '-184,500'],
  ['Extraordinary Items', '-', '-', '-', '-'],
  ['Effect Of Accounting Changes', '-', '-', '-', '-'],
  ['Other Items', '-', '-', '-', '-'],
  ['Net Income'],
  ['Net Income', '-85,774', '31,818', '-195,863', '-316,353'],
  ['Preferred Stock And Other Adjustments', '-', '-', '-', '-'],
  ['Net Income Applicable To Common Shares', '-157,708', '-40,565', '-268,247', '-388,655']]]

You should be creating a workbook and append data to it. Try this code snippet

from bs4 import BeautifulSoup
import html5lib
import os
import openpyxl
from openpyxl import Workbook

html = """ data """
soup=BeautifulSoup(html,'html5lib')
tables=soup.findAll('table')
tableE = []

for table in tables:
    rows = []
    for row in table.findAll('tr')[0:]:
        cells = []
        for cell in row.findAll('td'):
            text = cell.text
            cells.append(text)
        rows.append(cells)
    tableE.append(rows)
wb = Workbook()
ws = wb.active
for tab in tableE[0]: # tableE[0] is a list of list
    ws.append(tab) # Appends each list as a row in the workbook
wb.save("test.xlsx")

Just create a pandas dataframe from the list and save it to excel.

import pandas
pandas.DataFrame(list_to_Save).to_excel("output_path")

Here is how you do it:

import pandas as pd

values = [[['Revenue', '12/31/2018', '12/31/2017', '12/31/2016', '12/31/2015'], ['Total Revenue', '1,259,036', '1,253,148', '1,335,030', '1,443,527'], ['Cost of Revenue', '126,461', '128,717', '144,428', '146,155'], ['Gross Profit', '1,132,575', '1,124,431', '1,190,602', '1,297,372'], ['Operating Expenses'], ['Research Development', '-', '-', '-', '-'], ['Selling General and Administrative', '63,933', '58,603', '51,927', '67,137'], ['Non Recurring', '-', '-', '-', '-'], ['Others', '-', '-', '-', '-'], ['Total Operating Expenses', '830,212', '893,522', '956,193', '1,035,019'], ['Operating Income or Loss', '428,824', '359,626', '378,837', '408,508'], ['Income from Continuing Operations'], ['Total Other Income/Expenses Net', '-515,448', '-301,249', '-448,588', '-542,911'], ['Earnings Before Interest and Taxes', '428,824', '359,626', '378,837', '408,508'], ['Interest Expense', '-280,887', '-289,766', '-317,376', '-358,392'], ['Income Before Tax', '-86,624', '58,377', '-69,751', '-134,403'], ['Income Tax Expense', '5,101', '6,882', '7,136', '4,589'], ['Minority Interest', '143,085', '158,598', '172,172', '189,972'], ['Net Income From Continuing Ops', '-91,725', '51,495', '-76,887', '-138,992'], ['Non-recurring Events'], ['Discontinued Operations', '3,695', '-19,117', '-123,937', '-184,500'], ['Extraordinary Items', '-', '-', '-', '-'], ['Effect Of Accounting Changes', '-', '-', '-', '-'], ['Other Items', '-', '-', '-', '-'], ['Net Income'], ['Net Income', '-85,774', '31,818', '-195,863', '-316,353'], ['Preferred Stock And Other Adjustments', '-', '-', '-', '-'], ['Net Income Applicable To Common Shares', '-157,708', '-40,565', '-268,247', '-388,655']]]

columns = [i[0] for i in values[0]]
data = [(i[1:]) for i in values[0]]
df = pd.DataFrame(data).transpose()
df.columns = columns
df.to_csv("test.csv")
print(df)

if you want excell use:

df.to_excel("test.xlsx", sheet_name='sheet1', engine='xlsxwriter')

instead of

df.to_csv("test.csv")

but you must install xlsxwriter by using pip as follows:

sudo pip install xlsxwriter

To make it less verbose, you can try the following:

from openpyxl import Workbook
from bs4 import BeautifulSoup

wb = Workbook()
ws = wb.active
soup = BeautifulSoup(html, 'html5lib')
for items in soup.find('table').find_all("tr"):
    data = [item.text for item in items.find_all("td")]
    print(data)
    ws.append(data)
wb.save("tabular_content.xlsx")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM