简体   繁体   English

如何使用 Python 将数据从 csv 文件导入到我在 Sqlite3 中的表中

[英]How to import data from csv file into my table in Sqlite3 using Python

I want to import all data(around 200 000 rows) from csv file (which has 4 columns ) into already created table (table has 5 columns ) in sqlite3 database using Python (Pandas dataframe).我想使用 Python(Pandas 数据框)将 csv 文件(有4 列)中的所有数据(大约 200 000 行)导入到sqlite3数据库中已经创建的表(表有5 列)中。 Data types in csv and table satisfy with each other. csv 和 table 中的数据类型相互满足。 Problem is that, in the table there is a one extra column which is index_of(Primary Key) .问题是,在表中有一个额外的列index_of(Primary Key)

These are the first 3 lines of my csv file:这些是我的 csv 文件的前 3 行:

在此处输入图片说明

That's all I could do, I guess if it will work, it takes about 5-6 hours, because in this code I used for loop to read every row:这就是我所能做的,我想它是否会起作用,大约需要 5-6 个小时,因为在这段代码中,我使用 for 循环来读取每一行:

connection = _sqlite3.connect("db_name.sqlite")
cursor = connection.cursor()
with open('path_to_csv', 'r') as file:
    no_records = 0
    for row in file: 
        cursor.execute("INSERT INTO table_name (index_of, high, low, original, ship_date) VALUES (?,?,?,?,?)", row.split(","))
        connection.commit()
        no_records += 1  
        
connection.close()

but it shows me an error: Exception has occurred: OperationalError 5 values for 4 columns但它向我显示了一个错误:发生异常:4 列的 OperationalError 5 个值

Please, can you help me with this:拜托,你能帮我解决这个问题吗:

  1. How to import 200 000 rows fast using Python?如何使用 Python 快速导入 200 000 行?

  2. How to import all columns from csv file into the table's specific columns?如何将 csv 文件中的所有列导入表的特定列?

You need to provide a default value for the 5th column.您需要为第 5 列提供默认值。

You could also improve the performance of the script if you insert chunks of 100-200 rows in each SQL sentence.如果在每个 SQL 语句中插入 100-200 行的块,还可以提高脚本的性能。

user3380595 has already pointed out in their answer that you need to provide a value for the column index_of . user3380595 已经在他们的回答中指出您需要为列index_of提供一个值。

cursor.execute("""
    INSERT INTO Quotes (index_of, high, low, original, ship_date)
    VALUES (?, ?, ?, ?, ?)
    """, [index, *row])

I created 200,000 lines of test data and it loaded quite fast (less than 2 seconds).我创建了 200,000 行测试数据,并且加载速度非常快(不到 2 秒)。 See first example using csv and sqlite3 .请参阅使用csvsqlite3第一个示例。

As user3380595 mentioned, you could load the data in chunks if you are concerned about memory and performance.正如 user3380595 提到的,如果您担心内存和性能,您可以分块加载数据。 This scenario was actually loaded slightly slower.这个场景实际上加载速度稍慢。 See second example using pandas and sqlalchemy .请参阅使用pandassqlalchemy第二个示例。


Using csv and sqlite3使用csvsqlite3

Setup Test Environment设置测试环境

import csv
import sqlite3
import contextlib

import pandas as pd

test_data = r"/home/thomas/Projects/Playground/stackoverflow/data/test.csv"
test_db = r"/home/thomas/Projects/Playground/stackoverflow/data/test.db"

with contextlib.closing(sqlite3.connect(test_db)) as connection:
    
    cursor = connection.cursor()

    cursor.execute("DROP TABLE IF EXISTS Quotes;")

    cursor.execute("""
        CREATE TABLE IF NOT EXISTS Quotes (
            index_of INTEGER, -- PRIMARY KEY,
            high REAL,
            low REAL,
            original REAL,
            ship_date TEXT
        );
        """)

    connection.commit()

Load Data加载数据

with contextlib.closing(sqlite3.connect(test_db)) as connection:
    
    cursor = connection.cursor()

    with open(test_data, "r") as file:
        
        for index, row in enumerate(csv.reader(file)):
            cursor.execute("""
                INSERT INTO Quotes (index_of, high, low, original, ship_date)
                VALUES (?, ?, ?, ?, ?)
                """, [index, *row])

    connection.commit()

Using pandas and sqlalchemy使用pandassqlalchemy

Setup Test Environment设置测试环境

import pandas as pd

from sqlalchemy import create_engine

test_data = r"/home/thomas/Projects/Playground/stackoverflow/data/test.csv"
test_db = r"sqlite:////home/thomas/Projects/Playground/stackoverflow/data/test.db"

engine = create_engine(test_db, echo=True)

with engine.begin() as connection:

    engine.execute("DROP TABLE IF EXISTS Quotes;")

    engine.execute("""
        CREATE TABLE IF NOT EXISTS Quotes (
            index_of INTEGER, -- PRIMARY KEY,
            high REAL,
            low REAL,
            original REAL,
            ship_date TEXT
        );
        """)

Load Data (in chunks)加载数据(以块为单位)

with engine.begin() as connection:

    reader = pd.read_csv(test_data, iterator=True, chunksize=50000)

    for chunk in reader:
        chunk["index_of"] = chunk.index
        chunk.to_sql("Quotes", con=engine, if_exists="append", index=False)

Alternatively, instead of using pandas , you could also use sqlite3.Cursor.executemany and process chunks of rows.或者,您也可以使用sqlite3.Cursor.executemany并处理行块,而不是使用pandas

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM