如何只返回HTML代码段中的文本？

Question

I have a HTML snippet that looks like this: 我有一个HTML代码段，如下所示：

<pre>zdfsfsf<br/>adfadfadf
adfadfasdfadfad  adfadf adf 
Mill Valley, CA 94941
122-2323-24124
Email: adfadfadf<br/><i>sfsfsfsf</i></pre>
<br/>

I want to strip all tags and just have the text. 我想剥离所有标签，只需要文本。

Content should look like this: 内容应如下所示：

zdfsfsf adfadfadf
adfadfasdfadfad  adfadf adf 
Mill Valley, CA 94941
122-2323-24124
Email: adfadfadf sfsfsfsf

I'm looking for something like this: 我在寻找这样的东西：

cells = row.find_all('td')
for c in cells:
    c.STRIP_HTML_TAGS()?????? <--WHAT IS THIS FUNCTION?

Answer 1

You're looking for get_text() : 你正在寻找get_text() ：

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup("""<pre>zdfsfsf<br/>adfadfadf
... adfadfasdfadfad  adfadf adf 
... Mill Valley, CA 94941
... 122-2323-24124
... Email: adfadfadf<br/><i>sfsfsfsf</i></pre>
... <br/>""")
>>> print(soup.get_text())
zdfsfsfadfadfadf
adfadfasdfadfad  adfadf adf 
Mill Valley, CA 94941
122-2323-24124
Email: adfadfadfsfsfsfsf
>>>

如何只返回HTML代码段中的文本？

问题描述

1 个解决方案

解决方案1
3 已采纳 2013-06-20 01:36:55

如何只返回HTML代码段中的文本？

问题描述

1 个解决方案

解决方案1 3 已采纳 2013-06-20 01:36:55

解决方案1
3 已采纳 2013-06-20 01:36:55