[英]Importing xml data into a MYSQL db using python
将XML数据导入SQL时属性出现问题,我尝试了这4种或5种方法,但是必须错误地查看了问题。 我是python的新手,但是有使用SQL和其他语言的经验,使用xpath尝试了几种不同的方法,但仍然无法正常工作
<?xml version="1.0" encoding="UTF-8"?>
<PARTS>
<Header>
<Version>6.5</Version>
</Header>
<Items>
<Item MaintenanceType="A">
<HazardousMaterialCode>N</HazardousMaterialCode>
<ItemLevelGTIN GTINQualifier="UP">00651860733074</ItemLevelGTIN>
<PartNumber>14-230615</PartNumber>
<BrandAAIAID>BBGL</BrandAAIAID>
<BrandLabel>Bilstein</BrandLabel>
<ACESApplications>Y</ACESApplications>
<ItemQuantitySize UOM="EA">1.0</ItemQuantitySize>
<ContainerType>BX</ContainerType>
<QuantityPerApplication Qualifier="MAX" UOM="EA">1</QuantityPerApplication>
</Item>
</Items>
</PARTS>
from xml.etree import ElementTree
import mysql.connector
file_name = 'bsn.xml'
dom = ElementTree.parse(file_name)
mydb = mysql.connector.connect(user='frank', password='xxxxxx', host='127.0.0.1', database='scoresre', auth_plugin='mysql_native_password')
mycursor = mydb.cursor()
item = dom.findall('Items/Item')
for x in item:
PartNumber = x.find('PartNumber').text
BrandAAIAID = x.find('BrandAAIAID').text
BrandLabel = x.find('BrandLabel').text
ItemLevelGTIN = x.find('ItemLevelGTIN').text
GTINQualifier = x.find('.//GTINQualifier[@attr="UP"]')
print(PartNumber, BrandAAIAID, BrandLabel, ItemLevelGTIN, GTINQualifier)
val = (PartNumber, BrandAAIAID, BrandLabel, ItemLevelGTIN, GTINQualifier)
sql = "INSERT INTO scoreitem (B15_PartNumber, B20_BrandAAIAID, B25_BrandLabel, B10_ItemLevelGTIN, " \
"B11_GTINQualifier) VALUES (%s, %s, %s, %s, %s)"
mycursor.execute(sql, val)
mydb.commit()
代码未导入GTINQualifier =“ UP”下的属性-UP作为null出现ItemQuantitySize UOM =“ EA”-当我使用相同的上述语法并且Qualifier =“ MAX” UOM =“ EA时EA变为null “ MAX和EA也以NULL的形式出现。 提前谢谢你
此解决方案与您尝试做的一样。
items = dom.findall('Items/Item')
# use a '@' before any attribute in the list of target columns. This will help
# us treat the attributes separately from the regular tags.
target_cols = ['PartNumber', 'BrandAAIAID', 'BrandLabel', 'ItemLevelGTIN', '@GTINQualifier']
for item in items:
instance_dict = dict()
for col in target_cols:
label = col.replace("@","")
if col.startswith("@"):
instance_dict.update({col: item.find('.//*[@{}]'.format(label)).attrib[label]})
else:
instance_dict.update({col: item.find(label).text})
val = tuple(instance_dict[col_name] for col_name in target_cols)
print(instance_dict)
# **write to db here**
这是一种将xml数据作为dataframe / dict读取的解决方案。 然后,您可以获取需要写入数据库的字段。
我引入了一个虚拟的第二个<Item></Item>
标签,以检查此标签是否正确处理了多个标签。
xml_string = """
<PARTS>
<Header>
<Version>6.5</Version>
</Header>
<Items>
<Item MaintenanceType="A">
<HazardousMaterialCode>N</HazardousMaterialCode>
<ItemLevelGTIN GTINQualifier="UP">00651860733074</ItemLevelGTIN>
<PartNumber>14-230615</PartNumber>
<BrandAAIAID>BBGL</BrandAAIAID>
<BrandLabel>Bilstein</BrandLabel>
<ACESApplications>Y</ACESApplications>
<ItemQuantitySize UOM="EA">1.0</ItemQuantitySize>
<ContainerType>BX</ContainerType>
<QuantityPerApplication Qualifier="MAX" UOM="EA">1</QuantityPerApplication>
</Item>
<Item MaintenanceType="B">
<HazardousMaterialCode>N</HazardousMaterialCode>
<ItemLevelGTIN GTINQualifier="UP">00651860733084</ItemLevelGTIN>
<PartNumber>14-230620</PartNumber>
<BrandAAIAID>BBGL</BrandAAIAID>
<BrandLabel>BilsteinZ</BrandLabel>
<ACESApplications>Y</ACESApplications>
<ItemQuantitySize UOM="EA">1.0</ItemQuantitySize>
<ContainerType>BX</ContainerType>
<QuantityPerApplication Qualifier="MAX" UOM="EA">1</QuantityPerApplication>
</Item>
</Items>
</PARTS>
"""
(可选)使用pandas将整个xml视为数据框。
import pandas as pd
from xml.etree import ElementTree as ET
import mysql.connector
现在,递归地读取每个标签并写入数据库。
target_cols = ['PartNumber', 'BrandAAIAID', 'BrandLabel', 'ItemLevelGTIN', 'GTINQualifier']
sql = "INSERT INTO scoreitem (B15_PartNumber, B20_BrandAAIAID, B25_BrandLabel, B10_ItemLevelGTIN, " \
"B11_GTINQualifier) VALUES (%s, %s, %s, %s, %s)"
dict_cols = [None] # (either a list with a None) or (is equal to target_cols)
write_to_db = False # Set this to true when you write to db
file_name = 'bsn.xml'
# Let us test with the xml_string first
# set 'xml_source_is_file = True'
# when working with a file.
xml_source_is_file = False
if xml_source_is_file:
dom = ET.parse(file_name)
else:
dom = ET.fromstring(xml_string)
if write_to_db:
mydb = mysql.connector.connect(user='frank', password='xxxxxx',
host='127.0.0.1',
database='scoresre',
auth_plugin='mysql_native_password')
mycursor = mydb.cursor()
consolidated_dict = dict()
for xx in list(dom):
if xx.tag == 'Items':
#print(list(xx))
for i, item in enumerate(xx):
instance_dict = dict()
#print(list(item))
for ee in list(item):
#print(ee)
kee = ee.tag.replace('@','')
if (kee in dict_cols) or (dict_cols[0] is None):
instance_dict.update({kee: ee.text})
if isinstance(ee, dict):
for e in list(ee):
#print(e)
ke = e.tag.replace('@','')
if (ke in dict_cols) or (dict_cols[0] is None):
instance_dict.update({ke: e.text})
temp_dict = e.attrib
if len(temp_dict) > 0:
for jj in temp_dict.keys():
kjj = jj.replace('@','')
if (kjj in dict_cols) or (dict_cols[0] is None):
instance_dict.update({kjj: temp_dict.get(jj)})
temp_dict = ee.attrib
if len(temp_dict) > 0:
for jj in temp_dict.keys():
kjj = jj.replace('@','')
if (kjj in dict_cols) or (dict_cols[0] is None):
instance_dict.update({kjj: temp_dict.get(jj)})
#print(instance_dict)
consolidated_dict.update({i: instance_dict})
val = tuple(instance_dict[col_name] for col_name in target_cols)
print(val)
# Write to db here
if write_to_db:
mycursor.execute(sql, val)
mydb.commit()
df = pd.DataFrame(consolidated_dict).T
df
PS :请注意,有两个开关 , write_to_db
和xml_source_is_file
。 在您的情况下,您需要将它们都设置为True
才能写入数据库并从xml文件读取数据。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.