简体   繁体   English

Python:使用“lxml.html”将 HTML 内容注入标签

[英]Python: Injecting HTML content into a tag using `lxml.html`

I'm using the lxml.html library to parse an HTML document.我正在使用lxml.html库来解析 HTML 文档。

I located a specific tag, that I call content_tag , and I want to change its content (ie the text between <div> and </div> ,) and the new content is a string with some html in it, say it's 'Hello <b>world!</b>' .我找到了一个特定的标签,我称之为content_tag ,我想更改它的内容(即<div></div>之间的文本),新内容是一个带有一些 html 的字符串,说它是'Hello <b>world!</b>'

How do I do that?我怎么做? I tried content_tag.text = 'Hello <b>world!</b>' but then it escapes all the html tags, replacing < with &lt;我尝试了content_tag.text = 'Hello <b>world!</b>'但随后它转义了所有 html 标签,将<替换为&lt; etc.等等

I want to inject the text without escaping any HTML.我想注入没有escaping 任何 HTML 的文本。 How can I do that?我怎样才能做到这一点?

This is one way:这是一种方式:

#!/usr/bin/env python2.6
from lxml.html import fromstring, tostring
from lxml.html import builder as E
fragment = """\
<div id="outer">
  <div id="inner">This is div.</div>
</div>"""

div = fromstring(fragment)
print tostring(div)
# <div id="outer">
#   <div id="inner">This is div.</div>
# </div>
div.replace(div.get_element_by_id('inner'), E.DIV('Hello ', E.B('world!')))
print tostring(div)
# <div id="outer">
#   <div>Hello <b>world!</b></div></div>

See also: http://lxml.de/lxmlhtml.html#creating-html-with-the-e-factory另见: http://lxml.de/lxmlhtml.html#creating-html-with-the-e-factory

Edit: So, I should have confessed earlier that I'm not all that familiar with lxml.编辑:所以,我应该早点承认我对 lxml 不是很熟悉。 I looked at the docs and source briefly, but didn't find a clean solution.我简要查看了文档和源代码,但没有找到干净的解决方案。 Perhaps, someone more familiar will stop by and set us both straight.也许,更熟悉的人会停下来,让我们俩直截了当。

In the meantime, this seems to work, but is not well tested:与此同时,这似乎有效,但没有经过很好的测试:

import lxml.html
content_tag = lxml.html.fromstring('<div>Goodbye.</div>')
content_tag.text = '' # assumes only text to start
for elem in lxml.html.fragments_fromstring('Hello <b>world!</b>'):
    if type(elem) == str: #but, only the first?
        content_tag.text += elem
    else:
        content_tag.append(elem)
print lxml.html.tostring(content_tag)

Edit again: and this version removes text and children再次编辑:这个版本删除了文本和子项

somehtml = 'Hello <b>world!</b>'
# purge element contents
content_tag.text = ''
for child in content_tag.getchildren():
    content_tag.remove(child)

fragments = lxml.html.fragments_fromstring(somehtml)
if type(fragments[0]) == str:
    content_tag.text = fragments.pop(0)
content_tag.extend(fragments)

After tinkering around, i found this solution:经过一番折腾,我找到了这个解决方案:

fragments = lxml.html.fragments_fromstring(<string with tags to inject>)
last = None

for frag in fragments:
  if isinstance(frag, lxml.etree._Element):
    content_tag.append(frag)
    last = frag
  else:
    if last:
      last.tail = frag
    else:
      content_tag.text = frag

Assuming content_tag doesn't have any subelement, you can just do:假设 content_tag 没有任何子元素,您可以这样做:

from lxml import html
from lxml.html.builder import B

...

content_tag.text = 'Hello '
content_tag.append(B('world!'))
print html.tostring(content_tag)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM