简体   繁体   中英

Import XML into SQL database

I'm working with a 20 gig XML file that I would like to import into a SQL database (preferably MySQL, since that is what I am familiar with). This seems like it would be a common task, but after Googling around a bit I haven't been able to figure out how to do it. What is the best way to do this?

I know this ability is built into MySQL 6.0, but that is not an option right now because it is an alpha development release.

Also, if I have to do any scripting I would prefer to use Python because that's what I am most familiar with.

Thanks.

You can use the getiterator() function to iterate over the XML file without parsing the whole thing at once. You can do this with ElementTree , which is included in the standard library, or with lxml .

for record in root.getiterator('record'):
    add_element_to_database(record) # Depends on your database interface.
                                    # I recommend SQLAlchemy.

Take a look at the iterparse() function from ElementTree or cElementTree (I guess cElementTree would be best if you can use it)

This piece describes more or less what you need to do: http://effbot.org/zone/element-iterparse.htm#incremental-parsing

This will probably be the most efficient way to do it in Python. Make sure not to forget to call .clear() on the appropriate elements (you really don't want to build an in memory tree of a 20gig xml file: the .getiterator() method described in another answer is slightly simpler, but does require the whole tree first - I assume that the poster actually had iterparse() in mind as well)

I've done this several times with Python, but never with such a big XML file. ElementTree is an excellent XML library for Python that would be of assistance. If it was possible, I would divide the XML up into smaller files to make it easier to load into memory and parse.

It may be a common task, but maybe 20GB isn't as common with MySQL as it is with SQL Server.

I've done this using SQL Server Integration Services and a bit of custom code. Whether you need either of those depends on what you need to do with 20GB of XML in a database. Is it going to be a single column of a single row of a table? One row per child element?

SQL Server has an XML datatype if you simply want to store the XML as XML. This type allows you to do queries using XQuery, allows you to create XML indexes over the XML, and allows the XML column to be "strongly-typed" by referring it to a set of XML schemas, which you store in the database.

MySQL文档似乎并不表示XML导入仅限于版本6.它显然也适用于5。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM