简体   繁体   中英

MySQL load ignores some records

I have this CSV file with about 16.916 records. When I load this into a MySQL, it only detects 15.945 records. Thats what MySQL says:

Records: 15945  Deleted: 0  Skipped: 0  Warnings: 0

Can someone tell my why MySQL ignores some records and how I can fix this?

I load the file using the LOAD function like this:

LOAD DATA LOCAL INFILE 'germany-filtered.csv'
INTO TABLE point_of_interest
FIELDS TERMINATED BY ','
    ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(osm_id,lat,lng,access,addr_housename,addr_housenumber,addr_interpolation,admin_level,aerialway,aeroway,amenity,area,barrier,bicycle,brand,bridge,boundary,building,capital,construction,covered,culvert,cutting,denomination,disused,ele,embankment,foot,generator_source,harbour,highway,historic,horse,intermittent,junction,landuse,layer,leisure,ship_lock,man_made,military,motorcar,name,osm_natural,office,oneway,operator,place,poi,population,power,power_source,public_transport,railway,ref,religion,route,service,shop,sport,surface,toll,tourism,tower_type,tunnel,water,waterway,wetland,width,wood);

Thats the database schema I use:

CREATE TABLE point_of_interest (
    `poi_id` int(10) unsigned NOT NULL auto_increment,
    `lat` DECIMAL(10, 8) default NULL,
    `lng` DECIMAL(11, 8) default NULL,
    PRIMARY KEY  (`poi_id`),
    KEY `lat` (`lat`),
    KEY `lng` (`lng`),
    osm_id BIGINT,
    access TEXT,
    addr_housename TEXT,
    addr_housenumber TEXT,
    addr_interpolation TEXT,
    admin_level TEXT,
    aerialway TEXT,
    aeroway TEXT,
    amenity TEXT,
    area TEXT,
    barrier TEXT,
    bicycle TEXT,
    brand TEXT,
    bridge TEXT,
    boundary TEXT,
    building TEXT,
    capital TEXT,
    construction TEXT,
    covered TEXT,
    culvert TEXT,
    cutting TEXT,
    denomination TEXT,
    disused TEXT,
    ele TEXT,
    embankment TEXT,
    foot TEXT,
    generator_source TEXT,
    harbour TEXT,
    highway TEXT,
    historic TEXT,
    horse TEXT,
    intermittent TEXT,
    junction TEXT,
    landuse TEXT,
    layer TEXT,
    leisure TEXT,
    ship_lock TEXT,
    man_made TEXT,
    military TEXT,
    motorcar TEXT,
    name TEXT,
    osm_natural TEXT,
    office TEXT,
    oneway TEXT,
    operator TEXT,
    place TEXT,
    poi TEXT,
    population TEXT,
    power TEXT,
    power_source TEXT,
    public_transport TEXT,
    railway TEXT,
    ref TEXT,
    religion TEXT,
    route TEXT,
    service TEXT,
    shop TEXT,
    sport TEXT,
    surface TEXT,
    toll TEXT,
    tourism TEXT,
    tower_type TEXT,
    tunnel TEXT,
    water TEXT,
    waterway TEXT,
    wetland TEXT,
    width TEXT,
    wood TEXT
) ENGINE=InnoDB;

Update:

I already checked the first and last record but both exist. Also records with a lot of empty values like this do exist:

1503898236,10.5271308,52.7468051,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

Update 2:

Those are the records I found which are missing in the database:

4228380062,9.9386752,53.6135468,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Dammwild,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,
4228278589,9.9391503,53.5960304,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Kaninchen,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,
4228278483,9.9396935,53.5960729,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Onager,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,
4226772791,8.8394263,54.1354887,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Familienlagune Perlebucht,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,

It seems that nearly all of the records with the osm_id starting with a 4 are missing. Thats strange.

try this to see if you have duplicate id in the file:

show the file

# cat mycsv.csv
6991,10.4232704,49.4970160,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bauernhaus aus Seubersdorf,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,
4228380062,9.9386752,53.6135468,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Dammwild,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,
4228278589,9.9391503,53.5960304,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Kaninchen,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,
4228278483,9.9396935,53.5960729,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Onager,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,
4226772791,8.8394263,54.1354887,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Familienlagune Perlebucht,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,
4228278589,9.9391503,53.5960304,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Kaninchen,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,

count the lines

# wc -l mycsv.csv
6 mycsv.csv

remove duplicate ids and count again

# cut -d',' -f1 mycsv.csv | sort | uniq | wc -l
5

I didn't found the reason why MySQL ignores some of the records, so I searched for workarounds. There are 2 solutions which work for me:

Split the CSV file into multiple parts

split -l 10 file.csv

I figured out that if I split the CSV into multiple parts and load them into MySQL that it recognizes every record. However, this only worked for me if the file was pretty small (~10 records/file). So this solution wasn't viable for me.

Convert the CSV into MySQL Insert Statements

This part of a bash script converts the csv file into a SQL file containing INSERT INTO clauses:

cp file.csv inserts.sql
# replace empty CSV value with NULL
sed -r 's;^,|,$;NULL,;g
:l
s;,,;,NULL,;g
t l' -i inserts.sql

#replace " with '
sed -e ':a' -e 'N' -e '$!ba' -e 's/\"/\x27/g' -i inserts.sql

# enquote every value
sed 's/[^,][^,]*/"&"/g' -i inserts.sql

# replace ,, with ,NULL,NULL,
sed 's/,,/,NULL,NULL,/g' -i inserts.sql

# replace ,, with ,
sed 's/,,/,/g' -i inserts.sql

# add INSERT INTO table_name VALUES (NULL, before each line
# Note: The first value is NULL because its the primary key which is set from my table
sed 's/^/INSERT INTO table_name VALUES (NULL,/' -i inserts.sql

# add ); at the end of each line
sed 's/$/);/' -i inserts.sql

# replace ,); with );
sed 's/,);/);/g' -i inserts.sql

Note: I do not guarantee that this solution works with all CSV files, so check the generated SQL file before using it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM