错误的美丽汤

Question

I have to remove the text in the title tag from this source: 我必须从此源中删除标题标签中的文本：

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html dir="ltr" lang="en">
<head>
    <title>Microsoft to acquire Nokia’s devices &amp; services business, license Nokia’s patents and mapping services</title>
    <meta http-equiv="X-UA-Compatible" content="IE=EmulateIE9; IE=10" />
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <meta id="ctl00_WtCampaignId" name="DCSext.wt_linkid" />
    </title>

I am using this to remove the text: 我正在使用它来删除文本：

opener = urllib2.build_opener()
opener.addheaders = [('User-agent', 'Mozilla/5.0')]

ourUrl = opener.open("http://www.thehindubusinessline.com/industry-and-economy/info-tech/nokia-cannot-license-brand-nokia-post-microsoft-deal/article5156470.ece").read()

soup = BeautifulSoup(ourUrl)
print soup
dem = soup.findAll('p')
hea = soup.findAll('title')

This code correctly extracts the p tags however fails when trying to extract title. 此代码正确提取了p标签，但是在尝试提取标题时失败。 Thanks. 谢谢。 I have only included a part of the code, dont worry the rest of it works fine. 我只包含了一部分代码，不用担心其余的代码工作正常。

Answer 1

There is an error in your html code! 您的html代码有错误！ You have 2 </title> endtags: 您有2个</title>结束标签：

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html dir="ltr" lang="en">
<head>
    <title>Microsoft to acquire Nokia’s devices &amp; services business, license Nokia’s patents and mapping services</title>
    <meta http-equiv="X-UA-Compatible" content="IE=EmulateIE9; IE=10" />
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <meta id="ctl00_WtCampaignId" name="DCSext.wt_linkid" />
    </title> #You already have endtag of <title>

So the fixed code should look like this: 因此，固定代码应如下所示：

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html dir="ltr" lang="en">
<head>
    <title>Microsoft to acquire Nokia’s devices &amp; services business, license Nokia’s patents and mapping services</title>
    <meta http-equiv="X-UA-Compatible" content="IE=EmulateIE9; IE=10" />
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <meta id="ctl00_WtCampaignId" name="DCSext.wt_linkid" />

错误的美丽汤

问题描述

1 个解决方案

解决方案1
0 已采纳 2013-09-23 08:53:58

错误的美丽汤

问题描述

1 个解决方案

解决方案1 0 已采纳 2013-09-23 08:53:58

解决方案1
0 已采纳 2013-09-23 08:53:58