简体   繁体   中英

Xml to dict how to ignore some characters when I convert my xml file to json file

I wouldlike to remove some character when I try to convert my xml to dict:

data = xmltodict.parse(open('test.xml').read())

    with open('test2.json', "wt", encoding='utf-8', errors='ignore') as f:
        json.dump(data, f, indent=4, sort_keys=True)
        return data

The problem actually i have many json file some json file like this:

{
        "pcrs:test A": {
            "pcrs:nature": "03", 
            "pcrs:producteur": "SIEML"
}}

And some json file like this(without pcrs):

{
        "test B": {
            "nature": "03", 
            "producteur": "SIEML",
}}

How to force any file like the first example to be without 'pcrs:' as the seconde example.

That is a namespace prefix. Because you don't include sample XML, I've made up one of my own.

<?xml version="1.0" encoding="UTF-8"?>
<root_elem xmlns:pcrs="http://the/pcrs/url">
<pcrs:subelem/>
</root_elem>

xmltodict lets you manage namespaces by mapping the namespace url to a different representation. Most notably, None removes it completely. See Namespace Support .

In your case, you can do

data = xmltodict.parse(open('test.xml').read(),
    process_namespaces=True,
    namespaces={"http://the/pcrs/url":None})

substituting the real namespace URL for http://the/pcrs/url .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM