繁体   English   中英

MySQL加载忽略一些记录

[英]MySQL load ignores some records

我有这个CSV文件 ,大约有16.916条记录。 当我将其加载到MySQL中时,它仅检测到15.945条记录。 那就是MySQL所说的:

Records: 15945  Deleted: 0  Skipped: 0  Warnings: 0

谁能告诉我为什么MySQL会忽略一些记录以及如何解决这个问题?

我使用LOAD函数加载文件,如下所示:

LOAD DATA LOCAL INFILE 'germany-filtered.csv'
INTO TABLE point_of_interest
FIELDS TERMINATED BY ','
    ENCLOSED BY '"'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(osm_id,lat,lng,access,addr_housename,addr_housenumber,addr_interpolation,admin_level,aerialway,aeroway,amenity,area,barrier,bicycle,brand,bridge,boundary,building,capital,construction,covered,culvert,cutting,denomination,disused,ele,embankment,foot,generator_source,harbour,highway,historic,horse,intermittent,junction,landuse,layer,leisure,ship_lock,man_made,military,motorcar,name,osm_natural,office,oneway,operator,place,poi,population,power,power_source,public_transport,railway,ref,religion,route,service,shop,sport,surface,toll,tourism,tower_type,tunnel,water,waterway,wetland,width,wood);

那就是我使用的数据库架构:

CREATE TABLE point_of_interest (
    `poi_id` int(10) unsigned NOT NULL auto_increment,
    `lat` DECIMAL(10, 8) default NULL,
    `lng` DECIMAL(11, 8) default NULL,
    PRIMARY KEY  (`poi_id`),
    KEY `lat` (`lat`),
    KEY `lng` (`lng`),
    osm_id BIGINT,
    access TEXT,
    addr_housename TEXT,
    addr_housenumber TEXT,
    addr_interpolation TEXT,
    admin_level TEXT,
    aerialway TEXT,
    aeroway TEXT,
    amenity TEXT,
    area TEXT,
    barrier TEXT,
    bicycle TEXT,
    brand TEXT,
    bridge TEXT,
    boundary TEXT,
    building TEXT,
    capital TEXT,
    construction TEXT,
    covered TEXT,
    culvert TEXT,
    cutting TEXT,
    denomination TEXT,
    disused TEXT,
    ele TEXT,
    embankment TEXT,
    foot TEXT,
    generator_source TEXT,
    harbour TEXT,
    highway TEXT,
    historic TEXT,
    horse TEXT,
    intermittent TEXT,
    junction TEXT,
    landuse TEXT,
    layer TEXT,
    leisure TEXT,
    ship_lock TEXT,
    man_made TEXT,
    military TEXT,
    motorcar TEXT,
    name TEXT,
    osm_natural TEXT,
    office TEXT,
    oneway TEXT,
    operator TEXT,
    place TEXT,
    poi TEXT,
    population TEXT,
    power TEXT,
    power_source TEXT,
    public_transport TEXT,
    railway TEXT,
    ref TEXT,
    religion TEXT,
    route TEXT,
    service TEXT,
    shop TEXT,
    sport TEXT,
    surface TEXT,
    toll TEXT,
    tourism TEXT,
    tower_type TEXT,
    tunnel TEXT,
    water TEXT,
    waterway TEXT,
    wetland TEXT,
    width TEXT,
    wood TEXT
) ENGINE=InnoDB;

更新:

我已经检查了第一条记录和最后一条记录,但是两者都存在。 确实存在具有很多这样的空值的记录:

1503898236,10.5271308,52.7468051,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,

更新2:

这些是我发现的记录,这些记录在数据库中丢失:

4228380062,9.9386752,53.6135468,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Dammwild,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,
4228278589,9.9391503,53.5960304,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Kaninchen,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,
4228278483,9.9396935,53.5960729,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Onager,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,
4226772791,8.8394263,54.1354887,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Familienlagune Perlebucht,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,

似乎几乎所有以4开头的osm_id记录都丢失了。 那很奇怪。

尝试此操作以查看文件中是否有重复的ID:

显示文件

# cat mycsv.csv
6991,10.4232704,49.4970160,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Bauernhaus aus Seubersdorf,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,
4228380062,9.9386752,53.6135468,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Dammwild,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,
4228278589,9.9391503,53.5960304,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Kaninchen,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,
4228278483,9.9396935,53.5960729,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Onager,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,
4226772791,8.8394263,54.1354887,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Familienlagune Perlebucht,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,
4228278589,9.9391503,53.5960304,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,Kaninchen,,,,,,,,,,,,,,,,,,,,attraction,,,,,,,

数线

# wc -l mycsv.csv
6 mycsv.csv

删除重复的ID并再次计数

# cut -d',' -f1 mycsv.csv | sort | uniq | wc -l
5

我没有找到MySQL忽略某些记录的原因,所以我搜索了解决方法。 有两种对我有用的解决方案:

将CSV文件拆分为多个部分

split -l 10 file.csv

我发现,如果我将CSV分成多个部分并将它们加载到MySQL中,它将识别每条记录。 但是,这仅对我有用,如果文件很小(〜10个记录/文件)。 因此,该解决方案对我而言不可行。

将CSV转换为MySQL插入语句

bash脚本的这一部分将csv文件转换为包含INSERT INTO子句的SQL文件:

cp file.csv inserts.sql
# replace empty CSV value with NULL
sed -r 's;^,|,$;NULL,;g
:l
s;,,;,NULL,;g
t l' -i inserts.sql

#replace " with '
sed -e ':a' -e 'N' -e '$!ba' -e 's/\"/\x27/g' -i inserts.sql

# enquote every value
sed 's/[^,][^,]*/"&"/g' -i inserts.sql

# replace ,, with ,NULL,NULL,
sed 's/,,/,NULL,NULL,/g' -i inserts.sql

# replace ,, with ,
sed 's/,,/,/g' -i inserts.sql

# add INSERT INTO table_name VALUES (NULL, before each line
# Note: The first value is NULL because its the primary key which is set from my table
sed 's/^/INSERT INTO table_name VALUES (NULL,/' -i inserts.sql

# add ); at the end of each line
sed 's/$/);/' -i inserts.sql

# replace ,); with );
sed 's/,);/);/g' -i inserts.sql

注意:我不保证该解决方案适用于所有CSV文件,因此在使用之前请检查生成的SQL文件。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM