[英]Pandas read_html results in TypeError
我正在使用bs4解析html頁面並提取表,下面給出的示例表,並且嘗試將其加載到熊貓中,但是當我調用pddataframe = pd.read_html(LOTable,skiprows=2, flavor=['bs4'])
我收到下面列出的錯誤,但是我可以打印由bs4修飾的表
有什么建議可以解決這個問題,而無需獲取每個td並以1的比例讀取嗎?
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<th colspan="2">
Learning Outcomes
</th>
</tr>
<tr>
<td class="info" colspan="2">
On successful completion of this module the learner will be able to:
</td>
</tr>
<tr>
<td style="width:10%;">
LO1
</td>
<td>
Demonstrate an awareness of the important role of Financial Accounting information as an input into the decision making process.
</td>
</tr>
<tr>
<td style="width:10%;">
LO2
</td>
<td>
Display an understanding of the fundamental accounting concepts, principles and conventions that underpin the preparation of Financial statements.
</td>
</tr>
<tr>
<td style="width:10%;">
LO3
</td>
<td>
Understand the various formats in which information in relation to transactions or events is recorded and classified.
</td>
</tr>
<tr>
<td style="width:10%;">
LO4
</td>
<td>
Apply a knowledge of accounting concepts,conventions and techniques such as double entry to the posting of recorded information to the T accounts in the Nominal Ledger.
</td>
</tr>
<tr>
<td style="width:10%;">
LO5
</td>
<td>
Prepare and present the financial statements of a Sole Trader in prescribed format from a Trial Balance accompanies by notes with additional information.
</td>
</tr>
</table>
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-20-12673b1a4bfc> in <module>()
10 #Read table into pandas
11 if first:
---> 12 pddataframe = pd.read_html(LOTable,skiprows=2, flavor=['bs4'])
13 first = False
14 pddataframe
C:\Program Files\Anaconda3\envs\LearningOutcomes\lib\site-packages\pandas\io\html.py in read_html(io, match, flavor, header, index_col, skiprows, attrs, parse_dates, tupleize_cols, thousands, encoding)
872 _validate_header_arg(header)
873 return _parse(flavor, io, match, header, index_col, skiprows,
--> 874 parse_dates, tupleize_cols, thousands, attrs, encoding)
C:\Program Files\Anaconda3\envs\LearningOutcomes\lib\site-packages\pandas\io\html.py in _parse(flavor, io, match, header, index_col, skiprows, parse_dates, tupleize_cols, thousands, attrs, encoding)
734 break
735 else:
--> 736 raise_with_traceback(retained)
737
738 ret = []
C:\Program Files\Anaconda3\envs\LearningOutcomes\lib\site-packages\pandas\compat\__init__.py in raise_with_traceback(exc, traceback)
331 if traceback == Ellipsis:
332 _, _, traceback = sys.exc_info()
--> 333 raise exc.with_traceback(traceback)
334 else:
335 # this version of raise is a syntax error in Python 3
**TypeError: 'NoneType' object is not callable**
感謝所有建議的答案和評論的指針,我的菜鳥錯誤是我使用bs4提取表后將表放入變量中。 我跑pd.read_html(LOTable,skiprows=2, flavor='bs4')
當我需要運行pd.read_html(LOTable.prettify(),skiprows=2, flavor='bs4')
熊貓可以猜測。
HTML = '''\
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<th colspan="2">
Learning Outcomes
</th>
... omitting most of what you had here
Prepare and present the financial statements of a Sole Trader in prescribed format from a Trial Balance accompanies by notes with additional information.
</td>
</tr>
</table>'''
from io import StringIO
import pandas as pd
df = pd.read_html(StringIO(HTML))
print (df)
結果:
[ 0 \
0 Learning Outcomes
1 On successful completion of this module the le...
2 LO1
3 LO2
4 LO3
5 LO4
6 LO5
1
0 NaN
1 NaN
2 Demonstrate an awareness of the important role...
3 Display an understanding of the fundamental ac...
4 Understand the various formats in which inform...
5 Apply a knowledge of accounting concepts,conve...
6 Prepare and present the financial statements o... ]
這個確切的代碼對我有用。
htm = """<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<th colspan="2">
Learning Outcomes
</th>
</tr>
<tr>
<td class="info" colspan="2">
On successful completion of this module the learner will be able to:
</td>
</tr>
<tr>
<td style="width:10%;">
LO1
</td>
<td>
Demonstrate an awareness of the important role of Financial Accounting information as an input into the decision making process.
</td>
</tr>
<tr>
<td style="width:10%;">
LO2
</td>
<td>
Display an understanding of the fundamental accounting concepts, principles and conventions that underpin the preparation of Financial statements.
</td>
</tr>
<tr>
<td style="width:10%;">
LO3
</td>
<td>
Understand the various formats in which information in relation to transactions or events is recorded and classified.
</td>
</tr>
<tr>
<td style="width:10%;">
LO4
</td>
<td>
Apply a knowledge of accounting concepts,conventions and techniques such as double entry to the posting of recorded information to the T accounts in the Nominal Ledger.
</td>
</tr>
<tr>
<td style="width:10%;">
LO5
</td>
<td>
Prepare and present the financial statements of a Sole Trader in prescribed format from a Trial Balance accompanies by notes with additional information.
</td>
</tr>
</table>
"""
pd.read_html(htm, skiprows=2, flavor='bs4')[0]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.