I am very new to python and after reading several articles online I am not sure how to proceed. I have a 3.56 gig CSV file that I am trying to subset into multiple data frames using the pandas and sqlalchemy packages in python.
I used the following code to convert the CSV file into a database and now I am trying to query the database and store the results in a data frame called test. However, whenever I execute the code on the bottom I get the following error: OperationalError: (sqlite3.OperationalError) near "table": syntax error [SQL: 'SELECT COL1, COL6, COL7 FROM table where COL1 = 2001'] (Background on this error at: http://sqlalche.me/e/e3q8)
I have also tried selecting all of the columns in the database by using "SELECT* FROM table where COL1 = 2000"
in the SQL query. However, it returns the same error.
import pandas as pd
from sqlalchemy import create_engine
file = "/Users/benalbert/Desktop/Econ522/usa_00001.csv"
csv_database = create_engine("sqlite:///csv_database.db")
chunksize = 1000
i = 0
j = 1
for df in pd.read_csv(file, chunksize=chunksize, iterator=True):
df = df.rename(columns={c: c.replace(' ', '') for c in df.columns})
df.index += j
i+=1
df.to_sql('table', csv_database, if_exists='append')
j = df.index[-1] + 1
test = pd.read_sql_query('SELECT COL1, COL6, COL7 FROM table where COL1 =
2001', csv_database)
The desired output is a new data frame containing observations from only columns 1, 6, and 7 when column 1 has a value of 2001.
You can change the table name from 'table' to anything else and the code works. This worked for me.
import pandas as pd
from sqlalchemy import create_engine
file = "arandomlargefileIhad.csv"
csv_database = create_engine("sqlite:///csv_database.db")
cnx = csv_database.raw_connection()
chunksize = 1000
i = 0
j = 1
for df in pd.read_csv(file, chunksize=chunksize, iterator=True):
df.index += j
i+=1
df.to_sql('random', csv_database, if_exists='append')
j = df.index[-1] + 1
sql_statement = "SELECT * FROM random"
test = pd.read_sql_query(sql_statement, csv_database) #this works
test2 = pd.read_sql_query(sql_statement, cnx) #so does this
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.