如何使用beautifulsoup4从html数据中删除h2标签？

Question

I want to remove the h2 tag what i am getting after removing applying the following script. 我想删除应用以下脚本后得到的h2标签。 I have been using beautifulsoup to get data from the website. 我一直在使用beautifulsoup从网站获取数据。

url = 'http://diningdelights.in/Normal-mum.aspx'
br = mechanize.Browser()
br.open(url)

def select_form(form):
    return form.attrs.get('id', None) == 'form1'
br.select_form(predicate=select_form)
br.form.set_all_readonly(False)
br.form["hdnPageSearch"]='3'
br.submit()

soup = BeautifulSoup(br.response().read())

for g_data in soup.find_all("div", class_="innerContainer"):
    h2_data=g_data.find_all("h2")
    print h2_data

I am just getting data inside the h2. 我只是在h2中获取数据。 for example. 例如。

<h2> Evergreen </h2>, <h2> Evergreen</h2>

could somebody help how i can remove the following tag. 有人可以帮助我如何删除以下标签。

Answer 1

You are printing the elements; 您正在打印元素； if you want the text in the element, then retrieve that. 如果要在元素中输入文本，则进行检索。 For example, with the .string attribute : 例如，使用.string属性：

print h2_data.string

You can simplify your search using a CSS selector : 您可以使用CSS选择器简化搜索：

for h2_data in soup.select("div.innerContainer h2"):
    print h2_data.string

如何使用beautifulsoup4从html数据中删除h2标签？

问题描述

1 个解决方案

解决方案1
0 2015-02-09 22:25:17

如何使用beautifulsoup4从html数据中删除h2标签？

问题描述

1 个解决方案

解决方案1 0 2015-02-09 22:25:17

解决方案1
0 2015-02-09 22:25:17