在Python 3中HTML解析文本

Question

我试图用Python 3.3从网页上获取文本，然后在该文本中搜索某些字符串。 找到匹配的字符串时，我需要保存以下文本。 例如，我进入以下页面： http : //gatherer.wizards.com/Pages/Card/Details.aspx?name=Dark%20Prophecy ，我需要将每个类别（卡片文本，稀有性等）之后的文本保存在卡信息 目前，我正在使用漂亮的汤，但是get_text导致UnicodeEncodeError，并且不返回可迭代的对象。 以下是相关代码：

               urlStr = urllib.request.urlopen('http://gatherer.wizards.com/Pages/Card/Details.aspx?name=' + cardName).read()

                htmlRaw = BeautifulSoup(urlStr)

                htmlText = htmlRaw.get_text

                for line in htmlText:
                    line = line.strip()
                    if "Converted Mana Cost:" in line:
                        cmc = line.next()
                        message += "*Converted Mana Cost: " + cmc +"* \n\n"
                    elif "Types:" in line:
                        type = line.next()
                        message += "*Type: " + type +"* \n\n"
                    elif "Card Text:" in line:
                        rulesText = line.next()
                        message += "*Rules Text: " + rulesText +"* \n\n"
                    elif "Flavor Text:" in line:
                        flavor = line.next()
                        message += "*Flavor Text: " + flavor +"* \n\n"
                    elif "Rarity:" in line:
                        rarity == line.next()
                        message += "*Rarity: " + rarity +"* \n\n"

Answer 1

考虑改用lxml和xpath ，您将能够执行以下操作：

>>> from lxml import html
>>> root = html.parse("http://gatherer.wizards.com/Pages/Card/Details.aspx?name=Dark%20Prophecy")
>>> root.xpath('//div[contains(text(), "Flavor Text")]/following-sibling::div/div/i/text()')
['When the bog ran short on small animals, Ekri turned to the surrounding farmlands.']

在Python 3中HTML解析文本

问题描述

1 个解决方案

解决方案1
1 2014-01-27 19:01:21

在Python 3中HTML解析文本

问题描述

1 个解决方案

解决方案1 1 2014-01-27 19:01:21

解决方案1
1 2014-01-27 19:01:21