简体   繁体   中英

LOAD DATA INFILE - fields terminated by character which also appears in field

I have a large .csv file which I want to import into a MySQL database. I want to use the LOAD DATA INFILE statement on the basis of its speed.

Fields are terminated by -|- . Lines are terminated by |-- . Currently I am using the following statement:

LOAD DATA LOCAL INFILE 'C:\\test.csv' INTO TABLE mytable FIELDS TERMINATED BY '-|-' LINES TERMINATED BY '|--'

Most rows look something like this: (Note that the strings are not enclosed by any characters.)

goodstring-|--|-goodstring-|-goodstring-|-goodstring|--
goodstring-|--|-goodstring-|-goodstring-|-|--
goodstring-|-goodstring-|-goodstring-|-goodstring-|-|--

goodstring is a string that does not contain - as a character. As you can see the second or last column might be empty. Rows like the above do not cause any problems. However the last column may contain - characters. There might be a row that looks something like this:

goodstring-|--|-goodstring-|-goodstring-|---|--

The string -- in the last column causes problems. MySQL detects six instead of five columns. It inserts a single - character into the fifth column and truncates the sixth. The correct DB row should be ("goodstring", NULL, "goodstring", "goodstring", "--") .

A solution would be to tell MySQL to regard everything after the fourth field has been terminated as part of the fith column (up until the line is terminated). Is this possible with LOAD DATA INFILE ? Are there methods that yield the same result, do not require the source file to be edited and perform about as fast as LOAD DATA INFILE ?

This is my solution:

LOAD DATA
LOCAL INFILE 'C:\\test.csv'
INTO TABLE mytable
FIELDS TERMINATED BY '-|-'
LINES TERMINATED BY '-\r\n'
(col1, col2, col3, col4, @col5, col6)
SET @col5 = (SELECT CASE WHEN col6 IS NOT NULL THEN CONCAT(@col5, '-') ELSE LEFT(@col5, LENGTH(@col5) - 2) END);

It will turn a row like this one:

goodstring-|--|-goodstring-|-goodstring-|-|--

Into this:

("goodstring", "", "goodstring", "goodstring", NULL)

And a bad row like this one:

goodstring-|--|-goodstring-|-goodstring-|---|--

Into this:

("goodstring", "", "goodstring", "goodstring", "")

I simply drop the last column after the import.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM