繁体   English   中英

在 Pandas 数据框中扩展 XML 数据列并将其另存为新列

[英]Expanding XML data column in Pandas dataframe and save it as new columns

我有 (11145, 14) 形状数据集。 在其中一个专栏中,我有一个非常复杂的 XML 值。 我正在尝试扩展此 XML 列并将它们添加为新列。 这是此 XML 的一个示例:(出于隐私原因,我更改了值,但这是结构)

'
<?xml version="1.0" encoding="UTF-8"?>
<modulo
    xmlns="http://www.sadasdasdasdasd.it/12312312312/Fasdasdasda"
    xmlns:xsi="http://www.sss1231231233.org/200232321/XMLSchema-instance">
    <nomeTxt dataFill="ew.fill()" dataFillMode="auto" modelCodeMeaning="Nome" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasdasdasdasdasdqw">
        <![CDATA[*****]]>
    </nomeTxt>
    <adasdasdasdaq2qwdwasxasxas dataFill="ew.fill()" dataFillMode="auto" modelCodeMeaning="asdasdasdqweqwe" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasdasdasdasd2szszxc">
        <![CDATA[*****]]>
    </adasdasdasdaq2qwdwasxasxas>
    <qweweqweqweqweqweqwe dataFill="ew.fill()" dataFillMode="auto" modelCodeMeaning="sdsdsds" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasdasd">
        <![CDATA[M]]>
    </qweweqweqweqweqweqwe>
    <qewtrweqrqwerqwrqweqw dataFill="ew.fill()" dataFillMode="auto" modelCodeMeaning="qewtrweqrqwerqwrqweqw" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasdasdas">
        <![CDATA[213123123123]]>
    </qewtrweqrqwerqwrqweqw>
    <qewtrweqrqwerqwrqzxczxcasxcasxweqw dataFill="ew.fill(\'date\')" dataFillMode="auto" modelCodeMeaning="Data di nascita" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasfafassadasdasdasdas">
        <![CDATA[1927-21-13]]>
    </qewtrweqrqwerqwrqzxczxcasxcasxweqw>
    <sadasdasdasdasdsa codeValue="0" codeScheme="asdasdasdasdasdasd" codeMeaning="No" codeSchemeVersion="01">
        <![CDATA[No]]>
    </rbg_allergiefarmacologiche>
    <xczcxzcxzczxczxcz codeValue="0" codeScheme="asdasdasdasdasdasd" codeMeaning="No" codeSchemeVersion="01">
        <![CDATA[No]]>
    </xczcxzcxzczxczxcz>
    <asdasfascasasxasx codeValue="0" codeScheme="asdasdasdas" codeMeaning="No" codeSchemeVersion="01">
        <![CDATA[No]]>
    </asdasfascasasxasx>
    <asdasxcasxasxasxzxxz>
        <![CDATA[false]]>
    </asdasxcasxasxasxzxxz>
    <asxasxasxsaxasx xsi:nil="true"></asxasxasxsaxasx>
    <saxasx>
        <![CDATA[false]]>
    </saxasx>
    <asdasxasxasxas xsi:nil="true"></asdasxasxasxas>
    <asasdasdasdas>
        <![CDATA[false]]>
    </asasdasdasdas>
    <asasdasdasdasasasasd xsi:nil="true"></asasdasdasdasasasasd>
    <asasdasdasasd>
        <![CDATA[false]]>
    </asasdasdasasd>
    <zcxzcxzc xsi:nil="true"></zcxzcxzc>
</modulo>'

我尝试使用 for 循环搜索每一列,然后尝试将其转换为字典,然后将其保存为列。 这个解决方案的问题是,每一行都有不同的 <xml 列并且它们的数量不同。 所以我的解决方案不起作用。

df["XML_column"]
0        <?xml version="1.0" encoding="UTF-8"?><modulo ...
1        <?xml version="1.0" encoding="UTF-8"?><modulo ...
2        <?xml version="1.0" encoding="UTF-8"?><modulo ...
3        <?xml version="1.0" encoding="UTF-8"?><modulo ...
4        <?xml version="1.0" encoding="UTF-8"?><modulo ...
                               ...                        
11140    <?xml version="1.0" encoding="UTF-8"?><modulo ...
11141    <?xml version="1.0" encoding="UTF-8"?><modulo ...
11142    <?xml version="1.0" encoding="UTF-8"?><modulo ...
11143    <?xml version="1.0" encoding="UTF-8"?><modulo ...
11144    <?xml version="1.0" encoding="UTF-8"?><modulo ...

欢迎。 您的 XML 似乎有点颠簸。 如果我拿一个干净的片段,例如这个:

xml = '''<?xml version="1.0" encoding="UTF-8"?>
<modulo
    xmlns="http://www.sadasdasdasdasd.it/12312312312/Fasdasdasda"
    xmlns:xsi="http://www.sss1231231233.org/200232321/XMLSchema-instance">
    <nomeTxt dataFill="ew.fill()" dataFillMode="auto" modelCodeMeaning="Nome" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasdasdasdasdasdqw">
        <![CDATA[*****]]>
    </nomeTxt>
    <adasdasdasdaq2qwdwasxasxas dataFill="ew.fill()" dataFillMode="auto" modelCodeMeaning="asdasdasdqweqwe" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasdasdasdasd2szszxc">
        <![CDATA[*****]]>
    </adasdasdasdaq2qwdwasxasxas>
    <qweweqweqweqweqweqwe dataFill="ew.fill()" dataFillMode="auto" modelCodeMeaning="sdsdsds" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasdasd">
        <![CDATA[M]]>
    </qweweqweqweqweqweqwe>
    <qewtrweqrqwerqwrqweqw dataFill="ew.fill()" dataFillMode="auto" modelCodeMeaning="qewtrweqrqwerqwrqweqw" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasdasdas">
        <![CDATA[213123123123]]>
    </qewtrweqrqwerqwrqweqw>
    <qewtrweqrqwerqwrqzxczxcasxcasxweqw dataFill="ew.fill(\'date\')" dataFillMode="auto" modelCodeMeaning="Data di nascita" modelCodeScheme="asdasdasdas" modelCodeSchemeVersion="01" modelCodeValue="asdasfafassadasdasdasdas">
        <![CDATA[1927-21-13]]>
    </qewtrweqrqwerqwrqzxczxcasxcasxweqw>
</modulo>'''

我可以执行以下操作(此处仅显示前六列):

pd.read_xml(xml,parser='etree')
数据填充 数据填充模式 型号代码含义 模型代码方案 模型代码方案版本 模型代码值
0 ew.fill() 汽车 名称 阿斯达斯达斯 1 asdasdasdasdasdqw
1 ew.fill() 汽车 asdasdasdqweqwe 阿斯达斯达斯 1 asdasdasdasdasd2szszxc
2 ew.fill() 汽车 sdsds 阿斯达斯达斯 1 呸呸呸
3 ew.fill() 汽车 qewtrweqrqwerqwrqweqw 阿斯达斯达斯 1 阿斯达斯达斯
4 ew.fill('日期') 汽车 纳斯达克数据 阿斯达斯达斯 1 阿斯达斯法法萨达斯达斯达斯达斯达斯达斯达斯达斯达斯达斯达斯达斯达斯达斯达斯达斯达斯达斯达斯达斯达斯达

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM