[英]Parsing HTML using BS4 in Python
我正在嘗試使用以下HTML解析網站。
我正在使用Python和BeautifulSoup。
如何從中提取德州游騎兵的文字?
我不在上課,所以遇到了麻煩? 謝謝,
馬特
<div class="team">
<span class="team-logo mlb tex"></span>Texas Rangers
<br />
<a class="fancy" href="/split_stats/index/Baseball/Pitcher/107">BvP</a>
·
<a class="fancy" href="/split_stats/index/Baseball/Righty/107">vs. R/a>
·
<a class="fancy" href="/split_stats/index/Baseball/Away/107">Away</a>
·
<a class="fancy" href="/split_stats/index/Baseball/Night/107">Night</a>
</div>
可能不是最好的解決方案,但這可行。
>>> soup = BeautifulSoup(htmlCode)
>>> soup.div.contents[2].strip()
u'Texas Rangers'
我將使用在ipython中運行的以下代碼:
In [28]: htmldoc = """<div class="team">
....: <span class="team-logo mlb tex"></span>Texas Rangers
....: <br />
....: <a class="fancy" href="/split_stats/index/Baseball/Pitcher/107">BvP</a>
....: ·
....: <a class="fancy" href="/split_stats/index/Baseball/Righty/107">vs. R/a>
....: ·
....: </a><a class="fancy" href="/split_stats/index/Baseball/Away/107">Away</a>
....: ·
< ....: <a class="fancy" href="/split_stats/index/Baseball/Night/107">Night</a>
....: </div>
....: """
In [30]: soup = BeautifulSoup(htmldoc)
In [31]: import re
In [32]: soup(text=re.compile('Texas Rangers'))
Out[32]: [u'Texas Rangers\n']
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.