简体   繁体   中英

MySQL 5.6 LOAD XML LOCAL INFILE and empty XML elements

I have a large number of reasonably sizable XML files that I'd like to import into a MySQL table. I'm running Centos 6.3 and MySQL 5.6, I had initially tried 5.5 but ran into issues and later found out about a bug in 5.5 regarding empty XML tags so under the impression that this had been resolved in 5.6 I went for that.

The XML files themselves contains a number elements that I have no interest in so the table that the data is being inserted into contains quite a lot fewer fields than there are elements in the XML file but as far as I know this shouldn't be an issue. All of the field name correspond to element names in the XML files.

I have this table

    CREATE TABLE `products` (
        `sku` BIGINT(20) UNSIGNED NOT NULL,
        `productId` BIGINT(20) UNSIGNED NOT NULL,
        `name` VARCHAR(250) NULL,
        `type` VARCHAR(250) NULL,
        `format` VARCHAR(250) NULL,
        `albumTitle` VARCHAR(250) NULL,
        `artistName` VARCHAR(250) NULL,
        `upc` BIGINT(15) UNSIGNED NULL,
        `shortDescription` TEXT NULL,
        `image` VARCHAR(100) NULL,
        INDEX `Index 1` (`productId`),
        INDEX `Index 2` (`name`),
        INDEX `Index 3` (`type`),
        INDEX `Index 4` (`format`)
    )

The XML is in the format

  <products> <product> ... </product> ... </products> 

and I'm using this to insert the data

LOAD XML LOCAL INFILE 'filename.xml' INTO TABLE products ROWS IDENTIFIED BY '<product>';

The correct number of rows are being inserted but all fields in the database contain nothing or NULL. This appears to be the same issue I was seeing with 5.5 whereby XML containing empty tags eg <sku /> as opposed to <sku></sku> are not handled and cause this sort of result.

I suppose my question is is there anything I can do to prevent this behavior. Am I doing this correctly?

I had thought about trying to find and replace all empty tag but this is beyond my knowledge in Linux so maybe is that's an option and someone can suggest a way of achieving it that would be a great help, but any help would be much appreciated.

To answer my own question and in case anyone else experiences this issue, what I did was to create a simple bash script to remove any empty nodes from each of the files. The bash script I called clean.sh and it contained the following

for file in *xml
do
        echo "Processing $file"
         sed 's/<.*\/>//g' $file > tt
         mv tt processed/${file}
done

Note that I created a new directory called 'processed' where the processed files were placed.

To run the script (assuming your current location is where the script is located) you would just run

sh clean.sh

After running the same SQL query:

LOAD XML LOCAL INFILE 'filename.xml' INTO TABLE products ROWS IDENTIFIED BY '<product>';

The correct data was imported into the database table. The next step for me is to create another bash script to import all of the XML files.

Hope this helps someone.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM