[英]What is the best way to dump MySQL table data to csv and convert character encoding?
I have a table with about 200 columns. 我有一张约有200列的桌子。 I need to take a dump of the daily transaction data for ETL purposes.
我需要为ETL提取日常交易数据。 Its a MySQL DB.
它是一个MySQL数据库。 I tried that with Python both using pandas dataframe as well as basic write to CSV file method.
我尝试通过Python使用pandas数据框以及基本写入CSV文件的方法。 I even tried to look for the same functionality using shell script.
我什至尝试使用shell脚本寻找相同的功能。 I saw one such for oracle Database using sqlplus.
我看到了一个使用sqlplus的oracle数据库这样的数据库。 Following are my python codes with the two approaches:
以下是使用两种方法的python代码:
Using Pandas: 使用熊猫:
import MySQLdb as mdb
import pandas as pd
host = ""
user = ''
pass_ = ''
db = ''
query = 'SELECT * FROM TABLE1'
conn = mdb.connect(host=host,
user=user, passwd=pass_,
db=db)
df = pd.read_sql(query, con=conn)
df.to_csv('resume_bank.csv', sep=',')
Using basic python file write: 使用基本的python文件编写:
import MySQLdb
import csv
import datetime
currentDate = datetime.datetime.now().date()
host = ""
user = ''
pass_ = ''
db = ''
table = ''
con = MySQLdb.connect(user=user, passwd=pass_, host=host, db=db, charset='utf8')
cursor = con.cursor()
query = "SELECT * FROM %s;" % table
cursor.execute(query)
with open('Data_on_%s.csv' % currentDate, 'w') as f:
writer = csv.writer(f)
for row in cursor.fetchall():
writer.writerow(row)
print('Done')
The table has about 300,000 records. 该表有大约300,000条记录。 It's taking too much time with both the python codes.
这两个python代码花费太多时间。
Also, there's an issue with encoding here. 另外,这里的编码存在问题。 The DB resultset has some latin-1 characters for which I'm getting some errors like :
UnicodeEncodeError: 'ascii' codec can't encode character '\\x96' in position 1078: ordinal not in range(128).
DB结果集包含一些latin-1字符,我遇到一些错误,例如:
UnicodeEncodeError: 'ascii' codec can't encode character '\\x96' in position 1078: ordinal not in range(128).
I need to save the CSV in Unicode format. 我需要将CSV保存为Unicode格式。 Can you please help me with the best approach to perform this task.
您能否以最好的方式帮助我执行此任务。
A Unix based or Python based solution will work for me. 基于Unix或基于Python的解决方案将为我工作。 This script needs to be run daily to dump daily data.
该脚本需要每天运行以转储每日数据。
You can achieve that just leveraging MySql . 您可以利用MySql来实现。 For example:
例如:
SELECT * FROM your_table WHERE...
INTO OUTFILE 'your_file.csv'
FIELDS TERMINATED BY ','
OPTIONALLY ENCLOSED BY '"'
FIELDS ESCAPED BY '\'
LINES TERMINATED BY '\n';
if you need to schedule your query put such a query into a file (eg, csv_dump.sql) anche create a cron task like this one 如果您需要安排查询,请将此类查询放入文件(例如csv_dump.sql),然后创建一个像这样的cron任务
00 00 * * * mysql -h your_host -u user -ppassword < /foo/bar/csv_dump.sql
For strings this will use the default character encoding which happens to be ASCII, and this fails when you have non-ASCII characters. 对于字符串,它将使用碰巧是ASCII的默认字符编码,当您使用非ASCII字符时,它将失败。 You want unicode instead of str.
您需要unicode而不是str。
rows = cursor.fetchall()
f = open('Data_on_%s.csv' % currentDate, 'w')
myFile = csv.writer(f)
myFile.writerow([unicode(s).encode("utf-8") for s in rows])
fp.close()
You can use mysqldump
for this task. 您可以将
mysqldump
用于此任务。 ( Source for command ) ( 命令来源 )
mysqldump -u username -p --tab -T/path/to/directory dbname table_name --fields-terminated-by=','
The arguments are as follows: 参数如下:
-u username
for the username -u username
作为用户名 -p
to indicate that a password should be used -p
表示应该使用密码 -ppassword
to give the password via command line -ppassword
通过命令行提供密码 --tab
Produce tab-separated data files --tab
产生制表符分隔的数据文件 For mor command line switches see https://dev.mysql.com/doc/refman/5.5/en/mysqldump.html 有关mor命令行开关的信息,请参见https://dev.mysql.com/doc/refman/5.5/en/mysqldump.html
To run it on a regular basis, create a cron task like written in the other answers. 要定期运行它,请创建一个cron任务,如其他答案中所述。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.