简体   繁体   English

使用Python或R将非常大的sql文件导出到csv中

[英]export very large sql file into csv with Python or R

I have a large sql file (20 GB) that I would like to convert into csv. 我有一个大型的sql文件(20 GB),我想将其转换为csv。 I plan to load the file into Stata for analysis. 我计划将文件加载到Stata进行分析。 I have enough ram to load the entire file (my computer has 32GB in RAM) 我有足够的内存来加载整个文件(我的电脑在RAM中有32GB)

Problem is: the solutions I found online with Python so far (sqlite3) seem to require more RAM than my current system has to: 问题是:到目前为止我在网上找到的解决方案(sqlite3)似乎需要比我当前系统更多的RAM:

  • read the SQL 阅读SQL
  • write the csv 写csv

Here is the code 这是代码

import sqlite3
import pandas as pd

con=sqlite3.connect('mydata.sql')
query='select * from mydata'
data=pd.read_sql(query,con)
data.to_csv('export.csv')
con.close()

The sql file contains about 15 variables that can be timestamps, strings or numerical values. sql文件包含大约15个变量,可以是时间戳,字符串或数值。 Nothing really fancy. 没什么好看的。

I think one possible solution could be to read the sql and write the csv file one line at a time. 我认为一种可能的解决方案是读取sql并一次写一行csv文件。 However, I have no idea how to do that (either in R or in Python) 但是,我不知道如何做到这一点(在R或Python中)

Any help really appreciated! 任何帮助真的很感激!

You can read the SQL database in batches and write them to file instead of reading the whole database at once. 您可以批量读取SQL数据库并将其写入文件,而不是一次读取整个数据库。 Credit to How to add pandas data to an existing csv file? 感谢如何将pandas数据添加到现有的csv文件? for how to add to an existing CSV file. 有关如何添加到现有CSV文件。

import sqlite3
import pandas as pd

# Open the file
f = open('output.csv', 'w')
# Create a connection and get a cursor
connection = sqlite3.connect('mydata.sql')
cursor = connection.cursor()
# Execute the query
cursor.execute('select * from mydata')
# Get data in batches
while True:
    # Read the data
    df = pd.DataFrame(cursor.fetchmany(1000))
    # We are done if there are no data
    if len(df) == 0:
        break
    # Let's write to the file
    else:
        df.to_csv(f, header=False)

# Clean up
f.close()
cursor.close()
connection.close()

Use the sqlite3 command line program like this from the Windows cmd line or UNIX shell: 使用Windows cmd行或UNIX shell中的sqlite3命令行程序:

sqlite3 -csv "mydata.sql" "select * from mydata;" > mydata.csv

If mydata.sql is not in the current directory use the path and on Windows use forward slashes rather than backslashes. 如果mydata.sql不在当前目录中,请使用路径,在Windows上使用正斜杠而不是反斜杠。

Alternately run sqlite3 或者运行sqlite3

sqlite3

and enter these commands at the sqlite prompt: 并在sqlite提示符下输入以下命令:

.open "mydata.sql"
.ouptut mydata.csv
.mode csv
select * from mydata;
.quit

(or put them in a file called run , say, and use sqlite3 < run . (或者将它们放在一个名为run的文件中,比如说,并使用sqlite3 < run

Load the .sql file in mysql database and export it as CSV. .sql文件加载到mysql数据库并将其导出为CSV。

Commans to load mysql dump file in MySQL database. Commans在MySQL数据库中加载mysql转储文件。

Create a MySQL database 创建一个MySQL数据库

create database <database_name>

mysqldump -u root -p <database_name> < dumpfilename.sql

Command to export MySQL table as CSV 将MySQL表导出为CSV的命令

mysql -u root -p
use <database_name>

SELECT * INTO OUTFILE 'file.csv'
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
FROM <table_name>;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM