I'm getting data loss when doing a csv import using the Python MySQLdb module. The crazy thing is that I can load the exact same csv using other MySQL clients and it works fine.
It's truncating about 10 rows off of my 7019 row csv.
The command I'm calling: LOAD DATA LOCAL INFILE '/path/to/load.txt' REPLACE INTO TABLE tble_name FIELDS TERMINATED BY ","
When the above command is ran using the native mysql client on linux or sequel pro mysql client on mac it works fine and I get 7019 rows imported.
When the above command is ran using Python's MySQLdb module such as:
dest_cursor.execute( '''LOAD DATA LOCAL INFILE '/path/to/load.txt' REPLACE INTO TABLE tble_name FIELDS TERMINATED BY ","''' )
dest_db.commit()
Most all rows are imported but I get thrown out a slew of Warning: (1265L, "Data truncated for column '<various_column_names' at row <various_rows>")
When the warnings pop up, it states at row <row_num>
but I'm not seeing that correlate to the row in the csv (I think it's the row it's trying to create on the target table, not the row in the csv) so I can't use that to help troubleshoot.
And sure enough, when it's done, my target table is missing some rows.
Unfortunately with over 7,000 rows in the csv it's hard to tell exactly which line it's choking on for further analysis . When the warnings pop up, it states at row <row_num>
but I'm not seeing that correlate to the row in the csv (I think it's the row it's trying to create on the target table, not the row in the csv) so I can't use that to help troubleshoot.
There are many rows that are null and/or empty spaces but they are importing fine.
The fact that I can import the entire csv using other MySQL clients makes me feel that the MySQLdb module is not configured right or something.
This is Python 2.7 Any help is appreciated. Any ideas on how to get better visibility into which line it's choking up on would be helpful.
To Further help I would ask you the following.
SELECT @@GLOBAL.SQL_WARNINGS;
(if so this should show you the errors, as it might be silently failing.) SELECT @@GLOBAL.SQL_MODE;
"
's for one. "
or ,
's or anything that may get caught in translation of bash/python/mysql? python 2.7
SELECT @@GLOBAL.VERSION;
Query:
SELECT DISTINCT DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME
FROM INFORMATION_SCHEMA.SCHEMATA
WHERE (
SCHEMA_NAME <> 'sys' AND
SCHEMA_NAME <> 'mysql' AND
SCHEMA_NAME <> 'information_schema' AND
SCHEMA_NAME <> '.mysqlworkbench' AND
SCHEMA_NAME <> 'performance_schema'
);
Query:
SELECT DISTINCT ENGINE, TABLE_COLLATION FROM INFORMATION_SCHEMA.TABLES
WHERE (
TABLE_SCHEMA <> 'sys' AND
TABLE_SCHEMA <> 'mysql' AND
TABLE_SCHEMA <> 'information_schema' AND
TABLE_SCHEMA <> '.mysqlworkbench' AND
TABLE_SCHEMA <> 'performance_schema'
);
Query:
SELECT DISTINCT CHARACTER_SET_NAME, COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS
WHERE (
TABLE_SCHEMA <> 'sys' AND
TABLE_SCHEMA <> 'mysql' AND
TABLE_SCHEMA <> 'information_schema' AND
TABLE_SCHEMA <> '.mysqlworkbench' AND
TABLE_SCHEMA <> 'performance_schema'
);
For connection collation/character_set
SHOW VARIABLES
WHERE VARIABLE_NAME LIKE 'CHARACTER\_SET\_%' OR
VARIABLE_NAME LIKE 'COLLATION%';
If the first two ways work without error then I'm leaning toward:
I am not ruling out problems with any of the following:
possible python connection configuriation issues around
python/bash runtime interpolation of symbols causing a random hidden gem
db collation not set to handle foreign languages
exceeding the MAX(field values)
issues with the data as i mentioned above with Double-Quotes, Commas, and I forgot to mention about NewLines for Windows
or Linux
(Carriage return or NewLine)
All in all there is a lot to look at and require more information to further assist.
Please update your question when you have more information and I will do the same for my answer to help you resolve your error.
Hope this helps and all goes well!
Your Error
Warning: (1265L, "Data truncated for column
Leads me to believe it is the Double-Quote
around your "field terminations" Check to make sure your data does NOT have commas inside of the errored out fields. This will cause your data to shift when running command-line. As the gui is "Smart-ENOUGH" per say to deal with this. but the command-line is literal!
This is an embarrassing one but maybe I can help someone in the future making horrible mistakes like I have.
I spent a lot of time analyzing fields, checking for special characters, etc and it turned out I was simply causing the problem myself.
I had spaces in the csv, and NOT using a forced ENCLOSED BY
in the load statement. This means I was adding a space character to some fields thus causing an overflow. So the data looked like value1, value2, value3
when it should have been value1,value2,value3
. Removing those spaces, putting quotes around the fields and enforcing ENCLOSED BY
in my statement fixed this. I assume that the clients that were working were sanitizing the data behind the scenes or something. I really don't know for sure why it was working elsewhere using the same csv but that got me through the first set of hurdles.
Then after getting through that, the last line in the csv was choking and it was stating Row doesn't contain data for all columns
- turns out I didn't close()
the file after creating it before attempting to load it. So there was some sort of lock on the file. Once I added the close()
statement and fixed the spacing issue, all the data is loading now.
Sorry for anyone that spent any measure of time looking into this issue for me.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.