[英]How to extract text from between html tags?
I have a some html
elements from which I want to extract the text. 我有一些
html
元素,我想从中提取文本。 So the html
is like 所以
html
就像
<pre>
<span class="ansi-red-fg">ZeroDivisionError</span>Traceback (most recent call last)
<span class="ansi-green-fg"><ipython-input-2-0f9f90da76dc></span> in <span class="ansi-cyan-fg"><module></span><span class="ansi-blue-fg">()</span>
</pre>
where I want to extract the text as 我想将文本提取为
ZeroDivisionErrorTraceback (most recent call last)
<ipython-input-2-0f9f90da76dc> in<module>()
I found an answer to that issue here , but it does not work for me. 我在这里找到了该问题的答案,但是它对我不起作用。 Complete example code
完整的示例代码
from bs4 import BeautifulSoup as BSHTML
bs = BSHTML("""<pre>
<span class="ansi-red-fg">ZeroDivisionError</span>Traceback (most recent call last)
<span class="ansi-green-fg"><ipython-input-2-0f9f90da76dc></span> in <span class="ansi-cyan-fg"><module></span><span class="ansi-blue-fg">()</span>
</pre>""")
print bs.font.contents[0].strip()
where I get the following error: 我收到以下错误:
Traceback (most recent call last):
File "invest.py", line 13, in <module>
print bs.font.contents[0].strip()
AttributeError: 'NoneType' object has no attribute 'contents'
Anything I am missing? 我有什么想念的吗? Version of
beautifulsoap
: 4.6.0 版本的
beautifulsoap
:4.6.0
Do you want all the text content of that pre
block? 您是否需要该
pre
块的所有文本内容?
print bs.pre.text
Returns: 返回:
ZeroDivisionErrorTraceback (most recent call last)
<ipython-input-2-0f9f90da76dc> in <module>()
The .font
in your code sample refers to the HTML tag <font>
. 您的代码示例中的
.font
引用HTML标记<font>
。 Since you are instead looking to all the text from your document, you can use something like this: 由于您正在查找文档中的所有文本,因此可以使用以下内容:
contents = bs.find_all(text=True)
for c in contents:
print(c) # replace this with whatever you're trying to do
Output: 输出:
ZeroDivisionError
Traceback (most recent call last)
<ipython-input-2-0f9f90da76dc>
in
<module>
()
Currently bs.font
is None
because you are parsing a document that doesn't contain any <font>
tags. 当前
bs.font
为None
因为您正在解析不包含任何<font>
标记的文档。
If you just want the contents as one long string, you can get that by just using bs.text
如果只想将内容作为一个长字符串,则只需使用
bs.text
'\nZeroDivisionErrorTraceback (most recent call last)\n<ipython-input-2-0f9f90da76dc> in <module>()\n'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.