簡體   English   中英

熊貓read_html導致TypeError

[英]Pandas read_html results in TypeError

我正在使用bs4解析html頁面並提取表,下面給出的示例表,並且嘗試將其加載到熊貓中,但是當我調用pddataframe = pd.read_html(LOTable,skiprows=2, flavor=['bs4'])我收到下面列出的錯誤,但是我可以打印由bs4修飾的表

有什么建議可以解決這個問題,而無需獲取每個td並以1的比例讀取嗎?

樣品表

<table cellpadding="5" cellspacing="0" class="borders" width="100%">
    <tr>
     <th colspan="2">
      Learning Outcomes
     </th>
    </tr>
    <tr>
     <td class="info" colspan="2">
      On successful completion of this module the learner will be able to:
     </td>
    </tr>
    <tr>
     <td style="width:10%;">
      LO1
     </td>
     <td>
      Demonstrate an awareness of the important role of Financial Accounting information as an input into the decision making process.
     </td>
    </tr>
    <tr>
     <td style="width:10%;">
      LO2
     </td>
     <td>
      Display an understanding of the fundamental accounting concepts, principles and conventions that underpin the preparation of Financial statements.
     </td>
    </tr>
    <tr>
     <td style="width:10%;">
      LO3
     </td>
     <td>
      Understand the various formats in which  information in relation to transactions or events is recorded and classified.
     </td>
    </tr>
    <tr>
     <td style="width:10%;">
      LO4
     </td>
     <td>
      Apply a knowledge of accounting concepts,conventions and techniques such as double entry to the  posting of  recorded information to the T accounts in the Nominal Ledger.
     </td>
    </tr>
    <tr>
     <td style="width:10%;">
      LO5
     </td>
     <td>
      Prepare and present the financial statements of a Sole Trader  in prescribed format from a Trial Balance  accompanies by notes with additional information.
     </td>
    </tr>
   </table> 

錯誤

---------------------------------------------------------------------------  TypeError                                 Traceback (most recent call last) <ipython-input-20-12673b1a4bfc> in <module>()
     10         #Read table into pandas
     11         if first:
---> 12             pddataframe = pd.read_html(LOTable,skiprows=2, flavor=['bs4'])
     13             first = False
     14             pddataframe

C:\Program Files\Anaconda3\envs\LearningOutcomes\lib\site-packages\pandas\io\html.py in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, tupleize_cols, thousands, encoding)
    872     _validate_header_arg(header)
    873     return _parse(flavor, io, match, header, index_col, skiprows,
--> 874                   parse_dates, tupleize_cols, thousands, attrs, encoding)

C:\Program Files\Anaconda3\envs\LearningOutcomes\lib\site-packages\pandas\io\html.py in _parse(flavor, io, match, header, index_col, skiprows, parse_dates, tupleize_cols, thousands, attrs, encoding)
    734             break
    735     else:
--> 736         raise_with_traceback(retained)
    737 
    738     ret = []

C:\Program Files\Anaconda3\envs\LearningOutcomes\lib\site-packages\pandas\compat\__init__.py in raise_with_traceback(exc, traceback)
    331         if traceback == Ellipsis:
    332             _, _, traceback = sys.exc_info()
--> 333         raise exc.with_traceback(traceback)
    334 else:
    335     # this version of raise is a syntax error in Python 3

**TypeError: 'NoneType' object is not callable**

感謝所有建議的答案和評論的指針,我的菜鳥錯誤是我使用bs4提取表后將表放入變量中。 我跑pd.read_html(LOTable,skiprows=2, flavor='bs4')當我需要運行pd.read_html(LOTable.prettify(),skiprows=2, flavor='bs4')

熊貓可以猜測。

HTML = '''\
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
    <tr>
     <th colspan="2">
      Learning Outcomes
     </th>


... omitting most of what you had here


      Prepare and present the financial statements of a Sole Trader  in prescribed format from a Trial Balance  accompanies by notes with additional information.
     </td>
    </tr>
   </table>'''

from io import StringIO
import pandas as pd

df = pd.read_html(StringIO(HTML))
print (df)

結果:

[                                                   0  \
0                                  Learning Outcomes   
1  On successful completion of this module the le...   
2                                                LO1   
3                                                LO2   
4                                                LO3   
5                                                LO4   
6                                                LO5   

                                                   1  
0                                                NaN  
1                                                NaN  
2  Demonstrate an awareness of the important role...  
3  Display an understanding of the fundamental ac...  
4  Understand the various formats in which inform...  
5  Apply a knowledge of accounting concepts,conve...  
6  Prepare and present the financial statements o...  ]

這個確切的代碼對我有用。

htm = """<table cellpadding="5" cellspacing="0" class="borders" width="100%">
    <tr>
     <th colspan="2">
      Learning Outcomes
     </th>
    </tr>
    <tr>
     <td class="info" colspan="2">
      On successful completion of this module the learner will be able to:
     </td>
    </tr>
    <tr>
     <td style="width:10%;">
      LO1
     </td>
     <td>
      Demonstrate an awareness of the important role of Financial Accounting information as an input into the decision making process.
     </td>
    </tr>
    <tr>
     <td style="width:10%;">
      LO2
     </td>
     <td>
      Display an understanding of the fundamental accounting concepts, principles and conventions that underpin the preparation of Financial statements.
     </td>
    </tr>
    <tr>
     <td style="width:10%;">
      LO3
     </td>
     <td>
      Understand the various formats in which  information in relation to transactions or events is recorded and classified.
     </td>
    </tr>
    <tr>
     <td style="width:10%;">
      LO4
     </td>
     <td>
      Apply a knowledge of accounting concepts,conventions and techniques such as double entry to the  posting of  recorded information to the T accounts in the Nominal Ledger.
     </td>
    </tr>
    <tr>
     <td style="width:10%;">
      LO5
     </td>
     <td>
      Prepare and present the financial statements of a Sole Trader  in prescribed format from a Trial Balance  accompanies by notes with additional information.
     </td>
    </tr>
   </table> 
"""

pd.read_html(htm, skiprows=2, flavor='bs4')[0]

在此處輸入圖片說明

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM