繁体   English   中英

CSV 加载数据 infile 导致三倍大小的数据库 CSV

[英]CSV load data infile lead to a triple size database CSV

我有四个 csv 文件,它们的大小均为 200 GB。 我正在尝试将这四个文件加载为数据库的单独表。 问题出现在一个特定文件 (100gb) 当我使用加载数据 infile 时,该特定数据库的大小变为 250GB 我知道使用 MYISAM 存储引擎会导致我这样做,但是我正在使用 INNODB 甚至使用 INNODB 线索我到下面的 204GB 文件是我当前的 my.cnf 配置和 php.ini 配置我想我确定我搞砸了一些东西。

# The MySQL database server configuration file.
#
# You can copy this to one of:
# - "/etc/mysql/my.cnf" to set global options,
# - "~/.my.cnf" to set user-specific options.
# 
# One can use all long options that the program supports.
# Run program with --help to get a list of available options and with
# --print-defaults to see which it would actually understand and use.
#
# For explanations see
# http://dev.mysql.com/doc/mysql/en/server-system-variables.html

# This will be passed to all MySQL clients
# It has been reported that passwords should be enclosed with ticks/quotes
# especially if they contain "#" chars...
# Remember to edit /etc/mysql/debian.cnf when changing the socket location.

# Here is entries for some specific programs
# The following values assume you have at least 32M ram

[mysqld_safe]
socket      = /var/run/mysqld/mysqld.sock
nice        = 0

[mysqld]
#
# * Basic Settings
#
user        = mysql
pid-file    = /var/run/mysqld/mysqld.pid
socket      = /var/run/mysqld/mysqld.sock
port        = 3306
basedir     = /usr
datadir     = /var/lib/mysql
tmpdir      = /tmp
lc-messages-dir = /usr/share/mysql
skip-external-locking
#
# Instead of skip-networking the default is now to listen only on
# localhost which is more compatible and is not less secure.
bind-address        = 127.0.0.1
#
# * Fine Tuning
#
key_buffer_size     = 1G
max_allowed_packet  = 512M
thread_stack        = 192K
thread_cache_size       = 128M
# This replaces the startup script and checks MyISAM tables if needed
# the first time they are touched
myisam-recover-options  = BACKUP
#max_connections        = 100
#table_open_cache       = 64
#thread_concurrency     = 10
#
# * Query Cache Configuration
#
query_cache_limit   = 8M
query_cache_size        = 128M
#
# * Logging and Replication
#
# Both location gets rotated by the cronjob.
# Be aware that this log type is a performance killer.
# As of 5.1 you can enable the log at runtime!
#general_log_file        = /var/log/mysql/mysql.log
#general_log             = 1
#
# Error log - should be very few entries.
#
log_error = /var/log/mysql/error.log
#
# Here you can see queries with especially long duration
#slow_query_log     = 1
#slow_query_log_file    = /var/log/mysql/mysql-slow.log
#long_query_time = 2
#log-queries-not-using-indexes
#
# The following can be used as easy to replay backup logs or for replication.
# note: if you are setting up a replication slave, see README.Debian about
#       other settings you may need to change.
#server-id      = 1
#log_bin            = /var/log/mysql/mysql-bin.log
expire_logs_days    = 10
max_binlog_size   = 100M
#binlog_do_db       = include_database_name
#binlog_ignore_db   = include_database_name
#
# * InnoDB
#
# InnoDB is enabled by default with a 10MB datafile in /var/lib/mysql/.
# Read the manual for more InnoDB related options. There are many!
#
# * Security Features
#
# Read the manual, too, if you want chroot!
# chroot = /var/lib/mysql/
#
# For generating SSL certificates I recommend the OpenSSL GUI "tinyca".
#
# ssl-ca=/etc/mysql/cacert.pem
# ssl-cert=/etc/mysql/server-cert.pem
# ssl-key=/etc/mysql/server-key.pem
secure_file_priv=""
innodb_doublewrite = 0
innodb_support_xa = 0
innodb_buffer_pool_size = 10G
innodb_log_file_size = 512M
innodb_flush_log_at_trx_commit = 0

这是 php.ini 文件

https://gist.github.com/PhenomAmd/9e5940ee62bd12fa5d25609d93d2119e

最奇怪的是我删除了所有文件,我只留下这个 100GB 的文件仍然引导我进入 280GB 的数据库 vps 有 16GB ram 和 400GB SSD 我知道数据将适合服务器,就像我过去所做的一样,最后命令我'我使用的是这样的:

LOAD DATA INFILE '/var/lib/mysql-files/trade.csv' 
INTO TABLE trade 
CHARACTER SET latin1
FIELDS TERMINATED BY ',' 
ENCLOSED BY "'" 
LINES TERMINATED BY '\n' 
IGNORE 4 LINES (fid,serial_num,file_since_dt,bureau_id,member_kob,member_code,member_short_name,member_area_code,member_phone_num,acct_num,account_status,owner_indic,posted_dt,pref_cust_code,acct_type,contract_type,terms_num_paymts,terms_frequency,terms_amt,opened_dt,last_paymt_dt,last_purchased_dt,closed_dt,reporting_dt,reporting_mode,paid_off_dt,collateral,currency_code,high_credit_amt,cur_balance_amt,credit_limit,amt_past_due,paymt_pat_hst,paymt_pat_str_dt,paymt_pat_end_dt,cur_mop_status,remarks_code,restruct_dt,suppress_set_dt,suppress_expir_dt,max_delinqncy_amt,max_delinqncy_dt,max_delinqncy_mop,num_paymts_late,num_months_review,num_paymts_30_day,num_paymts_60_day,num_paymts_90_day,num_paymts_120_day,appraise_value,first_no_payment_dt,saldo_insoluto,last_paymt_amt,crc_indic,plazo_meses,monto_credito_original,last_past_due_dt,interest_amt,cur_interest_mop,days_past_due,email);

从 database_name 显示表状态

+-------+--------+---------+------------+-----------+----------------+--------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-------------------+----------+----------------+---------+
| Name  | Engine | Version | Row_format | Rows      | Avg_row_length | Data_length  | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time | Check_time | Collation         | Checksum | Create_options | Comment |
+-------+--------+---------+------------+-----------+----------------+--------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-------------------+----------+----------------+---------+
| trade | InnoDB |      10 | Dynamic    | 210002438 |            509 | 107098406912 |               0 |            0 |   7340032 |           NULL | 2021-07-02 11:11:47 | NULL        | NULL       | latin1_general_ci |     NULL |                |         |
+-------+--------+---------+------------+-----------+----------------+--------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-------------------+----------+----------------+---------+
1 row in set (0.00 sec)

show processlist 显示 3 次相同的查询,我也不知道为什么:

+-----+------------+-----------+---------+---------+------+--------------+------------------------------------------------------------------------------------------------------+
| Id  | User       | Host      | db      | Command | Time | State        | Info                                                                                                 |
+-----+------------+-----------+---------+---------+------+--------------+------------------------------------------------------------------------------------------------------+
| 112 | phpmyadmin | localhost | NULL    | Sleep   | 4810 |              | NULL                                                                                                 |
| 113 | root       | localhost | buronal | Query   | 4810 | executing    | LOAD DATA INFILE '/var/lib/mysql-files/trade.csv'
INTO TABLE trade
CHARACTER SET latin1
FIELDS  |
| 182 | phpmyadmin | localhost | NULL    | Sleep   | 4128 |              | NULL                                                                                                 |
| 183 | root       | localhost | buronal | Query   | 4128 | executing    | LOAD DATA INFILE '/var/lib/mysql-files/trade.csv'
INTO TABLE trade
CHARACTER SET latin1
FIELDS  |
| 250 | phpmyadmin | localhost | NULL    | Sleep   | 3446 |              | NULL                                                                                                 |
| 251 | root       | localhost | buronal | Query   | 3446 | executing    | LOAD DATA INFILE '/var/lib/mysql-files/trade.csv'
INTO TABLE trade
CHARACTER SET latin1
FIELDS  |
| 484 | phpmyadmin | localhost | NULL    | Sleep   |  755 |              | NULL                                                                                                 |
| 485 | root       | localhost | buronal | Query   |  755 | Sending data | SELECT * FROM `trade` LIMIT 0, 25                                                                    |
| 526 | root       | localhost | NULL    | Query   |    0 | starting     | show processlist                                                                                     |
| 551 | phpmyadmin | localhost | NULL    | Sleep   |   73 |              | NULL                                                                                                 |
| 552 | root       | localhost | buronal | Query   |   73 | Sending data | SELECT * FROM `trade` LIMIT 0, 25                                                                    |
+-----+------------+-----------+---------+---------+------+--------------+------------------------------------------------------------------------------------------------------+

听起来像

  • BIGINTs小数字的列的BIGINTs :csv 中的“1,2,3,4”平均每个数字 2 个字节; Bigint 总是 8 个字节 + 开销。
  • CHARVARCHAR是另一个潜在问题。 CHAR(255)在 latin1 时需要 255 个字节; VARCHAR(255)只占用所需的空间。 此外,InnoDB 处理CHAR方式与 MyISAM 不同。
  • 超大DECIMAL
  • DOUBLEFLOAT
  • INT表示布尔标志
  • 也许其他人
  • InnoDB 中的“开销”通常明显高于 MyISAM(与您的第一次比较相反)。

请提供SHOW CREATE TABLE以便我们进一步诊断。

my.cnf看起来紧16GB,如果你同时使用MyISAM和InnoDB。 将它们一分为二:“key_buffer_size”和“innodb_buffer_pool_size”。

在你意识到你应该只使用 InnoDB 之后:'key_buffer_size=50M' 和 'innodb_buffer_pool_size=10G'

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM