How to prevent lxml from converting '&' character to '&'?

Question

I need to send the control characters  and 
 in my XML file so that the text is displayed correctly in the target system.

For the creation of the XML file I use the lxml library. This is my attempt:

from lxml import etree as et
import lxml.builder

e = lxml.builder.ElementMaker()

xml_doc = e.newOrderRequest(
    e.Orders(
        e.Order(
            e.OrderNumber('12345'),
            e.OrderID('001'),
            e.Articles(
                e.Article(
                    e.ArticleNumber('000111'),
                    e.ArticleName('Logitec Mouse'),
                    e.ArticleDescription('* 4 Buttons&#x0D;&#x0A;* 600 DPI&#x0D;&#x0A;* Bluetooth')
                )
            )
        )
    )
)

tree = et.ElementTree(xml_doc)
tree.write('output.xml', pretty_print=True, xml_declaration=True, encoding="utf-8")

This is the result:

<?xml version='1.0' encoding='UTF-8'?>
<newOrderRequest>
  <Orders>
    <Order>
      <OrderNumber>12345</OrderNumber>
      <OrderID>001</OrderID>
      <Articles>
        <Article>
          <ArticleNumber>000111</ArticleNumber>
          <ArticleName>Logitec Mouse</ArticleName>
          <ArticleDescription>* 4 Buttons&amp;#x0D;&amp;#x0A;* 600 DPI&amp;#x0D;&amp;#x0A;* Bluetooth</ArticleDescription>
        </Article>
      </Articles>
    </Order>
  </Orders>
</newOrderRequest>

This is what I need:

<ArticleDescription>* 4 Buttons&#x0D;&#x0A;* 600 DPI&#x0D;&#x0A;* Bluetooth</ArticleDescription>

Is there a function in the lxml library to turn off the conversion or does anyone know a way to solve this problem? Thanks in advance.

Answer 1

This is not a python or lxml issue - it is how XML parsers and serializers work. If you want to use a specific character in your programming language, then make it that character. The serializer will convert it into an entity reference if required, and the parser will convert it back when reading the document. You cannot turn it off - it would be against the specification.

An exception might be to use a CDATA section as explained in What does <?[CDATA[]]> in XML mean?

Answer 2

The output of the Python script:

import lxml.etree as et
print(repr(et.fromstring('''<ArticleDescription>* 4 Buttons&#x0D;&#x0A;* 600 DPI&#x0D;&#x0A;* Bluetooth</ArticleDescription>''').text))

...is...

'* 4 Buttons\r\n* 600 DPI\r\n* Bluetooth'

That means that the Python-syntax way to write the XML-syntax string * 4 Buttons
* 600 DPI
* Bluetooth is as '* 4 Buttons\r\n* 600 DPI\r\n* Bluetooth' .

Thus, the relevant line of code should be:

e.ArticleDescription('* 4 Buttons\r\n* 600 DPI\r\n* Bluetooth')

...and if the consumer doesn't treat the resulting output as exactly identical to import lxml.etree as et print(repr(et.fromstring('''<ArticleDescription>* 4 Buttons
* 600 DPI
* Bluetooth</ArticleDescription> , that consumer is broken.

See https://replit.com/@CharlesDuffy2/ImportantClassicConversion#test.py running your code with the modification suggested above.

How to prevent lxml from converting '&' character to '&'?

Question

2 answers

solution1
2 2022-09-17 21:16:42

solution2
0 2022-09-17 22:03:34

How to prevent lxml from converting '&' character to '&amp;'?

Question

2 answers

solution1 2 2022-09-17 21:16:42

solution2 0 2022-09-17 22:03:34

How to prevent lxml from converting '&' character to '&'?

solution1
2 2022-09-17 21:16:42

solution2
0 2022-09-17 22:03:34