简体   繁体   English

在LXML文本元素中用HTML标记替换文本

[英]Replace text with HTML tag in LXML text element

I have some lxml element: 我有一些lxml元素:

>> lxml_element.text
  'hello BREAK world'

I need to replace the word BREAK with an HTML break tag— <br /> . 我需要用HTML中断标记替换BREAK这个词 - <br /> I've tried to do simple text replacing: 我试过做简单的文字替换:

lxml_element.text.replace('BREAK', '<br />')

but it inserts the tag with escaped symbols, like &lt;br/&gt; 但它会使用转义符号插入标记,例如&lt;br/&gt; . How do I solve this problem? 我该如何解决这个问题?

Here's how you could do it. 这是你如何做到的。 Setting up a sample lxml from your question: 从您的问题中设置样本lxml:

>>> import lxml
>>> some_data = "<b>hello BREAK world</b>"
>>> root = lxml.etree.fromstring(some_data)
>>> root
<Element b at 0x3f35a50>
>>> root.text
'hello BREAK world'

Next, create a subelement tag <br>: 接下来,创建一个子元素标签<br>:

>>> childbr = lxml.etree.SubElement(root, "br")
>>> childbr
<Element br at 0x3f35b40>
>>> lxml.etree.tostring(root)
'<b>hello BREAK world<br/></b>'

But that's not all you want. 但那不是你想要的全部。 You have to take the text before the <br> and place it in .text : 您必须在<br>之前获取文本并将其放在.text

>>> root.text = "hello"
>>> lxml.etree.tostring(root)
'<b>hello<br/></b>'

Then set the .tail of the child to contain the rest of the text: 然后设置子项的.tail以包含其余文本:

>>> childbr.tail = "world"
>>> lxml.etree.tostring(root)
'<b>hello<br/>world</b>'

Well I don't think you want to just change the text node of the element. 好吧,我不认为你只想改变元素的文本节点。 What I think you want to do is to modify the text node of your Element add a SubElement of name br to your lxml_element and then set the tail attribute of your subelement to the 2nd part of the string you are parsing. 我想你想要做的就是修改text您的节点Element添加SubElement名称的brlxml_element然后将tail的子元素的属性,您解析字符串的第二部分。 I found the tutorial here: http://lxml.de/tutorial.html#the-element-class to be very useful. 我在这里找到了教程: http//lxml.de/tutorial.html#the-element-class非常有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM