使用xml“地图”通过python将数据导入MySQL？

Question

我有许多不同的CSV文件，它们需要每小时处理到MySQL数据库的不同表中。 我正在用Python编写一个提取引擎，该引擎采用的XML映射文件如下所示：

<table name = "example_table">
    <column name = 'test'>
        <source type = 'sql'>schema.table_name.column_name</source>
    </column>
    <column name = 'test_2'>
        <source type = 'csv'>column_name</source>
    </column>
</table>

并使用它来确定将数据插入MySQL数据库的位置。 在此示例中：

在表“ example_table”中找到列“ test”，并从另一个sql表“ schema.table_name.column_name”（通常是另一个表的主键）中填充数据。
找到列“ test_2”，并从以“ column_name”为键的csv文件中填充数据

我要离开这里吗？ 这似乎是一种合理的方法吗？ 目标是拥有一个python引擎和多个xml映射文件，以便我可以有效地处理每一组插入。

有更好的方法吗？

Answer 1

这种方案没有内在的错误。 对于这种简单的映射，XML有点冗长和过大，但是它是高度标准化的，易于编辑的并且可以正常工作。 您应该能够轻松遍历此结构：

from lxml import objectify
table = objectify.fromstring(xml_source)
print "table name:", table.attrib['name']
for col in table.column:
    print "column name:", col.attrib['name']
    print "source type:", col.source.attrib['type']
    print "source col:",  col.source.text

现在，许多开发人员更喜欢JSON或YAML配置文件而不是XML。 例如，如果您想要类似的JSON：

{
   "table":"example_table",
   "columns":[
      {
         "name":"test",
         "source_type":"sql",
         "source_col":"schema.table_name.column_name"
      },
      {
         "name":"test_2",
         "source_type":"csv",
         "source_col":"column_name"
      }
   ]
}

您还可以轻松地迭代以下内容：

j = json.loads(json_source)
print "table name:", j['table']
for col in j['columns']:
    print "column name:", col['name']
    print "source type:", col['source_type']
    print "source col:",  col['source_col']

无论选择哪种特定格式，使用数据驱动的配方都是一种灵活的方式来获取您的摄取引擎。

使用xml“地图”通过python将数据导入MySQL？

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-08-07 17:55:24

使用xml“地图”通过python将数据导入MySQL？

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-08-07 17:55:24

解决方案1
1 已采纳 2014-08-07 17:55:24