
[英]PHP - Import CSV file to mysql database Using LOAD DATA INFILE
[英]CSV load data infile lead to a triple size database CSV
我有四个 csv 文件,它们的大小均为 200 GB。 我正在尝试将这四个文件加载为数据库的单独表。 问题出现在一个特定文件 (100gb) 当我使用加载数据 infile 时,该特定数据库的大小变为 250GB 我知道使用 MYISAM 存储引擎会导致我这样做,但是我正在使用 INNODB 甚至使用 INNODB 线索我到下面的 204GB 文件是我当前的 my.cnf 配置和 php.ini 配置我想我确定我搞砸了一些东西。
# The MySQL database server configuration file.
#
# You can copy this to one of:
# - "/etc/mysql/my.cnf" to set global options,
# - "~/.my.cnf" to set user-specific options.
#
# One can use all long options that the program supports.
# Run program with --help to get a list of available options and with
# --print-defaults to see which it would actually understand and use.
#
# For explanations see
# http://dev.mysql.com/doc/mysql/en/server-system-variables.html
# This will be passed to all MySQL clients
# It has been reported that passwords should be enclosed with ticks/quotes
# especially if they contain "#" chars...
# Remember to edit /etc/mysql/debian.cnf when changing the socket location.
# Here is entries for some specific programs
# The following values assume you have at least 32M ram
[mysqld_safe]
socket = /var/run/mysqld/mysqld.sock
nice = 0
[mysqld]
#
# * Basic Settings
#
user = mysql
pid-file = /var/run/mysqld/mysqld.pid
socket = /var/run/mysqld/mysqld.sock
port = 3306
basedir = /usr
datadir = /var/lib/mysql
tmpdir = /tmp
lc-messages-dir = /usr/share/mysql
skip-external-locking
#
# Instead of skip-networking the default is now to listen only on
# localhost which is more compatible and is not less secure.
bind-address = 127.0.0.1
#
# * Fine Tuning
#
key_buffer_size = 1G
max_allowed_packet = 512M
thread_stack = 192K
thread_cache_size = 128M
# This replaces the startup script and checks MyISAM tables if needed
# the first time they are touched
myisam-recover-options = BACKUP
#max_connections = 100
#table_open_cache = 64
#thread_concurrency = 10
#
# * Query Cache Configuration
#
query_cache_limit = 8M
query_cache_size = 128M
#
# * Logging and Replication
#
# Both location gets rotated by the cronjob.
# Be aware that this log type is a performance killer.
# As of 5.1 you can enable the log at runtime!
#general_log_file = /var/log/mysql/mysql.log
#general_log = 1
#
# Error log - should be very few entries.
#
log_error = /var/log/mysql/error.log
#
# Here you can see queries with especially long duration
#slow_query_log = 1
#slow_query_log_file = /var/log/mysql/mysql-slow.log
#long_query_time = 2
#log-queries-not-using-indexes
#
# The following can be used as easy to replay backup logs or for replication.
# note: if you are setting up a replication slave, see README.Debian about
# other settings you may need to change.
#server-id = 1
#log_bin = /var/log/mysql/mysql-bin.log
expire_logs_days = 10
max_binlog_size = 100M
#binlog_do_db = include_database_name
#binlog_ignore_db = include_database_name
#
# * InnoDB
#
# InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/.
# Read the manual for more InnoDB related options. There are many!
#
# * Security Features
#
# Read the manual, too, if you want chroot!
# chroot = /var/lib/mysql/
#
# For generating SSL certificates I recommend the OpenSSL GUI "tinyca".
#
# ssl-ca=/etc/mysql/cacert.pem
# ssl-cert=/etc/mysql/server-cert.pem
# ssl-key=/etc/mysql/server-key.pem
secure_file_priv=""
innodb_doublewrite = 0
innodb_support_xa = 0
innodb_buffer_pool_size = 10G
innodb_log_file_size = 512M
innodb_flush_log_at_trx_commit = 0
这是 php.ini 文件
https://gist.github.com/PhenomAmd/9e5940ee62bd12fa5d25609d93d2119e
最奇怪的是我删除了所有文件,我只留下这个 100GB 的文件仍然引导我进入 280GB 的数据库 vps 有 16GB ram 和 400GB SSD 我知道数据将适合服务器,就像我过去所做的一样,最后命令我'我使用的是这样的:
LOAD DATA INFILE '/var/lib/mysql-files/trade.csv'
INTO TABLE trade
CHARACTER SET latin1
FIELDS TERMINATED BY ','
ENCLOSED BY "'"
LINES TERMINATED BY '\n'
IGNORE 4 LINES (fid,serial_num,file_since_dt,bureau_id,member_kob,member_code,member_short_name,member_area_code,member_phone_num,acct_num,account_status,owner_indic,posted_dt,pref_cust_code,acct_type,contract_type,terms_num_paymts,terms_frequency,terms_amt,opened_dt,last_paymt_dt,last_purchased_dt,closed_dt,reporting_dt,reporting_mode,paid_off_dt,collateral,currency_code,high_credit_amt,cur_balance_amt,credit_limit,amt_past_due,paymt_pat_hst,paymt_pat_str_dt,paymt_pat_end_dt,cur_mop_status,remarks_code,restruct_dt,suppress_set_dt,suppress_expir_dt,max_delinqncy_amt,max_delinqncy_dt,max_delinqncy_mop,num_paymts_late,num_months_review,num_paymts_30_day,num_paymts_60_day,num_paymts_90_day,num_paymts_120_day,appraise_value,first_no_payment_dt,saldo_insoluto,last_paymt_amt,crc_indic,plazo_meses,monto_credito_original,last_past_due_dt,interest_amt,cur_interest_mop,days_past_due,email);
从 database_name 显示表状态
+-------+--------+---------+------------+-----------+----------------+--------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-------------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+-------+--------+---------+------------+-----------+----------------+--------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-------------------+----------+----------------+---------+
| trade | InnoDB | 10 | Dynamic | 210002438 | 509 | 107098406912 | 0 | 0 | 7340032 | NULL | 2021-07-02 11:11:47 | NULL | NULL | latin1_general_ci | NULL | | |
+-------+--------+---------+------------+-----------+----------------+--------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-------------------+----------+----------------+---------+
1 row in set (0.00 sec)
show processlist 显示 3 次相同的查询,我也不知道为什么:
+-----+------------+-----------+---------+---------+------+--------------+------------------------------------------------------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+-----+------------+-----------+---------+---------+------+--------------+------------------------------------------------------------------------------------------------------+
| 112 | phpmyadmin | localhost | NULL | Sleep | 4810 | | NULL |
| 113 | root | localhost | buronal | Query | 4810 | executing | LOAD DATA INFILE '/var/lib/mysql-files/trade.csv'
INTO TABLE trade
CHARACTER SET latin1
FIELDS |
| 182 | phpmyadmin | localhost | NULL | Sleep | 4128 | | NULL |
| 183 | root | localhost | buronal | Query | 4128 | executing | LOAD DATA INFILE '/var/lib/mysql-files/trade.csv'
INTO TABLE trade
CHARACTER SET latin1
FIELDS |
| 250 | phpmyadmin | localhost | NULL | Sleep | 3446 | | NULL |
| 251 | root | localhost | buronal | Query | 3446 | executing | LOAD DATA INFILE '/var/lib/mysql-files/trade.csv'
INTO TABLE trade
CHARACTER SET latin1
FIELDS |
| 484 | phpmyadmin | localhost | NULL | Sleep | 755 | | NULL |
| 485 | root | localhost | buronal | Query | 755 | Sending data | SELECT * FROM `trade` LIMIT 0, 25 |
| 526 | root | localhost | NULL | Query | 0 | starting | show processlist |
| 551 | phpmyadmin | localhost | NULL | Sleep | 73 | | NULL |
| 552 | root | localhost | buronal | Query | 73 | Sending data | SELECT * FROM `trade` LIMIT 0, 25 |
+-----+------------+-----------+---------+---------+------+--------------+------------------------------------------------------------------------------------------------------+
听起来像
BIGINTs
小数字的列的BIGINTs
:csv 中的“1,2,3,4”平均每个数字 2 个字节; Bigint 总是 8 个字节 + 开销。CHAR
与VARCHAR
是另一个潜在问题。 CHAR(255)
在 latin1 时需要 255 个字节; VARCHAR(255)
只占用所需的空间。 此外,InnoDB 处理CHAR
方式与 MyISAM 不同。DECIMAL
DOUBLE
与FLOAT
INT
表示布尔标志 请提供SHOW CREATE TABLE
以便我们进一步诊断。
你my.cnf
看起来紧16GB,如果你同时使用MyISAM和InnoDB。 将它们一分为二:“key_buffer_size”和“innodb_buffer_pool_size”。
在你意识到你应该只使用 InnoDB 之后:'key_buffer_size=50M' 和 'innodb_buffer_pool_size=10G'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.