[英]Shaping pandas read_html results into simpler structure
我希望有人可以建議我如何創建只包含第 2 列而不是第 1 2 行或左列中的文本的 Pandas 數據框。 解決方案需要能夠處理多個相似的表。
我原以為pd.read_html(LOTable.prettify(),skiprows=2, flavor='bs4')
從 html 創建一個數據pd.read_html(LOTable.prettify(),skiprows=2, flavor='bs4')
列表(跳過 2 行)會是這樣,但最終的數據結構太混亂了讓這個新手理解或操作成更簡單的結構。
其他人是否有辦法處理產生的結構或推薦改進數據的替代方法,以便我最終得到 1 列僅包含我需要的文本?
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<th colspan="2">
Learning Outcomes
</th>
</tr>
<tr>
<td class="info" colspan="2">
On successful completion of this module the learner will be able to:
</td>
</tr>
<tr>
<td style="width:10%;">
LO1
</td>
<td>
Demonstrate an awareness of the important role of Financial Accounting information as an input into the decision making process.
</td>
</tr>
<tr>
<td style="width:10%;">
LO2
</td>
<td>
Display an understanding of the fundamental accounting concepts, principles and conventions that underpin the preparation of Financial statements.
</td>
</tr>
<tr>
<td style="width:10%;">
LO3
</td>
<td>
Understand the various formats in which information in relation to transactions or events is recorded and classified.
</td>
</tr>
<tr>
<td style="width:10%;">
LO4
</td>
<td>
Apply a knowledge of accounting concepts,conventions and techniques such as double entry to the posting of recorded information to the T accounts in the Nominal Ledger.
</td>
</tr>
<tr>
<td style="width:10%;">
LO5
</td>
<td>
Prepare and present the financial statements of a Sole Trader in prescribed format from a Trial Balance accompanies by notes with additional information.
</td>
</tr>
</table>
第一個選項
使用iloc
這應該通過讓iloc
擺脫第一列來工作
pd.read_html(LOTable.prettify(),skiprows=2, flavor='bs4').iloc[:, 1:]
解釋
...iloc[:, 1:]
# ^ ^
# | \
# says to says to take columns
# take all starting with one and on
# rows
你可以只用單列
pd.read_html(LOTable.prettify(),skiprows=2, flavor='bs4').iloc[:, 1]
我運行的工作代碼
htm = """<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<th colspan="2">
Learning Outcomes
</th>
</tr>
<tr>
<td class="info" colspan="2">
On successful completion of this module the learner will be able to:
</td>
</tr>
<tr>
<td style="width:10%;">
LO1
</td>
<td>
Demonstrate an awareness of the important role of Financial Accounting information as an input into the decision making process.
</td>
</tr>
<tr>
<td style="width:10%;">
LO2
</td>
<td>
Display an understanding of the fundamental accounting concepts, principles and conventions that underpin the preparation of Financial statements.
</td>
</tr>
<tr>
<td style="width:10%;">
LO3
</td>
<td>
Understand the various formats in which information in relation to transactions or events is recorded and classified.
</td>
</tr>
<tr>
<td style="width:10%;">
LO4
</td>
<td>
Apply a knowledge of accounting concepts,conventions and techniques such as double entry to the posting of recorded information to the T accounts in the Nominal Ledger.
</td>
</tr>
<tr>
<td style="width:10%;">
LO5
</td>
<td>
Prepare and present the financial statements of a Sole Trader in prescribed format from a Trial Balance accompanies by notes with additional information.
</td>
</tr>
</table> """
pd.read_html(htm,skiprows=2, flavor='bs4')[0].iloc[:, 1:]
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.