在Python中使用BS4解析HTML

Question

I am trying to parse a website with the following HTML. 我正在尝试使用以下HTML解析网站。

I am using Python and BeautifulSoup. 我正在使用Python和BeautifulSoup。

How do I extract the text Texas Rangers out of this? 如何从中提取德州游骑兵的文字？

I am having trouble since it is not in a class? 我不在上课，所以遇到了麻烦？ Thanks, 谢谢，

Matt 马特

<div class="team">
            <span class="team-logo mlb tex"></span>Texas Rangers
                            <br />
                <a class="fancy" href="/split_stats/index/Baseball/Pitcher/107">BvP</a>
                &middot;


                                <a class="fancy" href="/split_stats/index/Baseball/Righty/107">vs. R/a>
                &middot;

                <a class="fancy" href="/split_stats/index/Baseball/Away/107">Away</a>
                &middot;

                                <a class="fancy" href="/split_stats/index/Baseball/Night/107">Night</a>

                    </div>

Answer 1

May not be the best solution but this works. 可能不是最好的解决方案，但这可行。

>>> soup = BeautifulSoup(htmlCode)
>>> soup.div.contents[2].strip()
u'Texas Rangers'

Answer 2

I would use the following code that I ran within ipython: 我将使用在ipython中运行的以下代码：

In [28]: htmldoc = """<div class="team">
   ....: <span class="team-logo mlb tex"></span>Texas Rangers
   ....: <br />
   ....: <a class="fancy" href="/split_stats/index/Baseball/Pitcher/107">BvP</a>
   ....: &middot;
   ....: <a class="fancy" href="/split_stats/index/Baseball/Righty/107">vs. R/a&gt;
   ....: &middot;
   ....: </a><a class="fancy" href="/split_stats/index/Baseball/Away/107">Away</a>
   ....: &middot;
<   ....: <a class="fancy" href="/split_stats/index/Baseball/Night/107">Night</a>
   ....: </div>
   ....: """

In [30]: soup = BeautifulSoup(htmldoc)

In [31]: import re

In [32]: soup(text=re.compile('Texas Rangers'))
Out[32]: [u'Texas Rangers\n']

在Python中使用BS4解析HTML

问题描述

2 个解决方案

解决方案1
2 2014-07-23 03:05:54

解决方案2
0 2014-07-23 03:15:29

在Python中使用BS4解析HTML

问题描述

2 个解决方案

解决方案1 2 2014-07-23 03:05:54

解决方案2 0 2014-07-23 03:15:29

解决方案1
2 2014-07-23 03:05:54

解决方案2
0 2014-07-23 03:15:29