简体   繁体   中英

FIELD TERMINATED BY in MySQL LOAD DATA INFILE

I have a function that gives me data about updates/inserts done on a DynamoDb table. For each upsert, I need to parse the data and map it to a corresponding MySQL table schema. I load this data into a file and execute the LOAD DATA INFILE statement provided by MySQL.

My statement looks something like this:

LOAD DATA FROM S3 FILE '%s' REPLACE INTO TABLE %s FIELD TERMINATED BY ',' LINES TERMINATED BY '\\n'"

And each line in the file might look like this.

orderNumber123, Mr. ABC, 5th Street New York, 100, 12-12-17

However the problem is that some of the fields within the data contains the comma ','. Now this causes problems because this comma is interpreted by SQL as a terminator of a field.

The fault statement may look like this.

orderNumber456, Mr. XYZ, 3rd Avenue, New Jersey, 100, 12-12-17

What FIELD TERMINATOR can I provide to avoid this problem? I understand that there is no way to completely prevent this situation but I'm asking what's the best way to make it very unlikely.

I have thought about using tab but that could also be part of the data.

UPDATE:

From the answer provided by [Ike Walker], enclosing the fields by double quotes does the trick. Of course this means I have to decorate my data even further but I suspect that is the only guaranteed way.

Also, if there are any quotes within the field, the SQL statement is intelligent enough to not recognize it as a enclosing character unless it is followed by the terminating character (so in our case ", would be the cue for termination of a field). Unfortunately, I have data where this pattern is part of a field. For example,

{type:long, range: "LONG","INT", amount:100}

To make SQL treat this as a single field, I had to replace each of the double quote by two double quotes.

{type:long, range: ""LONG"",""INT"", amount:100}

More about this here :

If the field begins with the ENCLOSED BY character, instances of that character are recognized as terminating a field value only if followed by the field or line TERMINATED BY sequence. To avoid ambiguity, occurrences of the ENCLOSED BY character within a field value can be doubled and are interpreted as a single instance of the character. For example, if ENCLOSED BY '"' is specified, quotation marks are handled as shown here:

"The ""BIG"" boss"  -> The "BIG" boss
The "BIG" boss      -> The "BIG" boss
The ""BIG"" boss    -> The ""BIG"" boss

The typical solution here is to enclose values in quotation marks, at least when the value contains the field separator.

For example you could format your input like this:

foo, "hi, I am a value with a comma", bar 

Then when you load your data you can include this in the LOAD DATA INFILE statement:

FIELD TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM