如何从中获取文本<p class="“" ”>使用 BeautifulSoup4 标记</p>

Question

I am scraping some webpages and trying to get simple text from all我正在抓取一些网页并尝试从所有人中获取简单的文本

<p> </p>

tags.标签。 In one particular instance I am encountering a 'p' tag with a class:在一个特定的例子中，我遇到了一个带有 class 的“p”标签：

<p class="SimpleBlock-module_p__Q3azD "> Some text here. </p>

Now using a simple:现在使用一个简单的：

Text = soup.findAll("p")

Results in:结果是：

Text = SimpleBlock-module_p__Q3azD  Some text here.

How do I only get the text part excluding the class name in Text above.如何仅获取上面文本中不包括 class 名称的文本部分。

I want a general solution which should be applicable in all situations whether there is a class within the 'p' tags or not.我想要一个适用于所有情况的通用解决方案，无论“p”标签中是否存在 class。

I am using Python3, requests, and BeautifulSoup4 on Windows 10.我在 Windows 10 上使用 Python3、请求和 BeautifulSoup4。

Answer 1

Try this:尝试这个：

from bs4 import BeautifulSoup

p = """<p class="SimpleBlock-module_p__Q3azD "> Some text here. </p>"""
print(BeautifulSoup(p, "html.parser").find("p").getText(strip=True))

Output: Output：

Some text here.

Answer 2

In BeautifulSoup 4, findAll does not exist anymore (bs3) and is replaced by find_all在 BeautifulSoup 4 中， findAll不再存在（bs3）并被find_all取代

find_all gives a list so in your example you should access it with: find_all给出了一个列表，因此在您的示例中，您应该使用以下方式访问它：

Text[0].string

如何从中获取文本<p class="“" ”>使用 BeautifulSoup4 标记</p>

问题描述

2 个解决方案

解决方案1
0 2021-02-14 07:58:00

解决方案2
0 2021-02-14 08:00:27

如何从中获取文本<p class="“" ”>使用 BeautifulSoup4 标记</p>

问题描述

2 个解决方案

解决方案1 0 2021-02-14 07:58:00

解决方案2 0 2021-02-14 08:00:27

解决方案1
0 2021-02-14 07:58:00

解决方案2
0 2021-02-14 08:00:27