使用python将xml数据导入MYSQL数据库

Question

将XML数据导入SQL时属性出现问题，我尝试了这4种或5种方法，但是必须错误地查看了问题。 我是python的新手，但是有使用SQL和其他语言的经验，使用xpath尝试了几种不同的方法，但仍然无法正常工作

<?xml version="1.0" encoding="UTF-8"?>
<PARTS>
    <Header>
        <Version>6.5</Version>
    </Header>
    <Items>
        <Item MaintenanceType="A">
            <HazardousMaterialCode>N</HazardousMaterialCode>
            <ItemLevelGTIN GTINQualifier="UP">00651860733074</ItemLevelGTIN>
            <PartNumber>14-230615</PartNumber>
            <BrandAAIAID>BBGL</BrandAAIAID>
            <BrandLabel>Bilstein</BrandLabel>
            <ACESApplications>Y</ACESApplications>
            <ItemQuantitySize UOM="EA">1.0</ItemQuantitySize>
            <ContainerType>BX</ContainerType>
            <QuantityPerApplication Qualifier="MAX" UOM="EA">1</QuantityPerApplication>
        </Item>
    </Items>
</PARTS>

from xml.etree import ElementTree
import mysql.connector

file_name = 'bsn.xml'
dom = ElementTree.parse(file_name)

mydb = mysql.connector.connect(user='frank', password='xxxxxx', host='127.0.0.1', database='scoresre', auth_plugin='mysql_native_password')
mycursor = mydb.cursor()

item = dom.findall('Items/Item')

for x in item:
    PartNumber = x.find('PartNumber').text
    BrandAAIAID = x.find('BrandAAIAID').text
    BrandLabel = x.find('BrandLabel').text
    ItemLevelGTIN = x.find('ItemLevelGTIN').text
    GTINQualifier = x.find('.//GTINQualifier[@attr="UP"]')
    print(PartNumber, BrandAAIAID, BrandLabel, ItemLevelGTIN, GTINQualifier)

    val = (PartNumber, BrandAAIAID, BrandLabel, ItemLevelGTIN, GTINQualifier)

    sql = "INSERT INTO scoreitem (B15_PartNumber, B20_BrandAAIAID, B25_BrandLabel, B10_ItemLevelGTIN, " \
          "B11_GTINQualifier) VALUES (%s, %s, %s, %s, %s)"

    mycursor.execute(sql, val)
    mydb.commit()

代码未导入GTINQualifier =“ UP”下的属性-UP作为null出现ItemQuantitySize UOM =“ EA”-当我使用相同的上述语法并且Qualifier =“ MAX” UOM =“ EA时EA变为null “ MAX和EA也以NULL的形式出现。 提前谢谢你

Answer 1

解决方案-A：

此解决方案与您尝试做的一样。

items = dom.findall('Items/Item')
# use a '@' before any attribute in the list of target columns. This will help 
# us treat the attributes separately from the regular tags.
target_cols = ['PartNumber', 'BrandAAIAID', 'BrandLabel', 'ItemLevelGTIN', '@GTINQualifier']

for item in items:
    instance_dict = dict()
    for col in target_cols:
        label = col.replace("@","")
        if col.startswith("@"):
            instance_dict.update({col: item.find('.//*[@{}]'.format(label)).attrib[label]})
        else:
            instance_dict.update({col: item.find(label).text})

    val = tuple(instance_dict[col_name] for col_name in target_cols)
    print(instance_dict)
    # **write to db here**

解决方案B：

这是一种将xml数据作为dataframe / dict读取的解决方案。 然后，您可以获取需要写入数据库的字段。

让我们做一些数据

我引入了一个虚拟的第二个<Item></Item>标签，以检查此标签是否正确处理了多个标签。

xml_string = """
<PARTS>
    <Header>
        <Version>6.5</Version>
    </Header>
    <Items>
        <Item MaintenanceType="A">
            <HazardousMaterialCode>N</HazardousMaterialCode>
            <ItemLevelGTIN GTINQualifier="UP">00651860733074</ItemLevelGTIN>
            <PartNumber>14-230615</PartNumber>
            <BrandAAIAID>BBGL</BrandAAIAID>
            <BrandLabel>Bilstein</BrandLabel>
            <ACESApplications>Y</ACESApplications>
            <ItemQuantitySize UOM="EA">1.0</ItemQuantitySize>
            <ContainerType>BX</ContainerType>
            <QuantityPerApplication Qualifier="MAX" UOM="EA">1</QuantityPerApplication>
        </Item>
        <Item MaintenanceType="B">
            <HazardousMaterialCode>N</HazardousMaterialCode>
            <ItemLevelGTIN GTINQualifier="UP">00651860733084</ItemLevelGTIN>
            <PartNumber>14-230620</PartNumber>
            <BrandAAIAID>BBGL</BrandAAIAID>
            <BrandLabel>BilsteinZ</BrandLabel>
            <ACESApplications>Y</ACESApplications>
            <ItemQuantitySize UOM="EA">1.0</ItemQuantitySize>
            <ContainerType>BX</ContainerType>
            <QuantityPerApplication Qualifier="MAX" UOM="EA">1</QuantityPerApplication>
        </Item>
    </Items>
</PARTS>
"""

解

（可选）使用pandas将整个xml视为数据框。

导入库

import pandas as pd
from xml.etree import ElementTree as ET
import mysql.connector

现在，递归地读取每个标签并写入数据库。

target_cols = ['PartNumber', 'BrandAAIAID', 'BrandLabel', 'ItemLevelGTIN', 'GTINQualifier']

sql = "INSERT INTO scoreitem (B15_PartNumber, B20_BrandAAIAID, B25_BrandLabel, B10_ItemLevelGTIN, " \
      "B11_GTINQualifier) VALUES (%s, %s, %s, %s, %s)"

dict_cols = [None] # (either a list with a None) or (is equal to target_cols)

write_to_db = False # Set this to true when you write to db

file_name = 'bsn.xml'

# Let us test with the xml_string first
# set 'xml_source_is_file = True' 
# when working with a file.
xml_source_is_file = False
if xml_source_is_file:
    dom = ET.parse(file_name)
else:
    dom = ET.fromstring(xml_string)

if write_to_db:
    mydb = mysql.connector.connect(user='frank', password='xxxxxx', 
                                   host='127.0.0.1',         
                                   database='scoresre', 
                                   auth_plugin='mysql_native_password')
    mycursor = mydb.cursor()

consolidated_dict = dict()

for xx in list(dom):
    if xx.tag == 'Items':
        #print(list(xx))
        for i, item in enumerate(xx):
            instance_dict = dict()
            #print(list(item))
            for ee in list(item):
                #print(ee)
                kee = ee.tag.replace('@','')
                if (kee in dict_cols) or (dict_cols[0] is None):
                    instance_dict.update({kee: ee.text})
                if isinstance(ee, dict):                    
                    for e in list(ee):
                        #print(e)
                        ke = e.tag.replace('@','')
                        if (ke in dict_cols) or (dict_cols[0] is None):
                            instance_dict.update({ke: e.text})
                        temp_dict = e.attrib
                        if len(temp_dict) > 0:
                            for jj in temp_dict.keys():
                                kjj = jj.replace('@','')
                                if (kjj in dict_cols) or (dict_cols[0] is None):
                                    instance_dict.update({kjj: temp_dict.get(jj)})
                temp_dict = ee.attrib
                if len(temp_dict) > 0:
                    for jj in temp_dict.keys():
                        kjj = jj.replace('@','')
                        if (kjj in dict_cols) or (dict_cols[0] is None):
                            instance_dict.update({kjj: temp_dict.get(jj)})                
            #print(instance_dict)
            consolidated_dict.update({i: instance_dict})
            val = tuple(instance_dict[col_name] for col_name in target_cols)
            print(val)
            # Write to db here
            if write_to_db:
                mycursor.execute(sql, val)
                mydb.commit()

df = pd.DataFrame(consolidated_dict).T
df

PS ：请注意，有两个开关， write_to_db和xml_source_is_file 。 在您的情况下，您需要将它们都设置为True才能写入数据库并从xml文件读取数据。

使用python将xml数据导入MYSQL数据库

问题描述

1 个解决方案

解决方案1
1 已采纳 2019-09-11 04:56:21

解决方案-A：

解决方案B：

让我们做一些数据

解

导入库

使用python将xml数据导入MYSQL数据库

问题描述

1 个解决方案

解决方案1 1 已采纳 2019-09-11 04:56:21

解决方案-A：

解决方案B：

让我们做一些数据

解

导入库

解决方案1
1 已采纳 2019-09-11 04:56:21