简体   繁体   中英

Creating Data Frame From Database Query in Python

I am very new to python and after reading several articles online I am not sure how to proceed. I have a 3.56 gig CSV file that I am trying to subset into multiple data frames using the pandas and sqlalchemy packages in python.

I used the following code to convert the CSV file into a database and now I am trying to query the database and store the results in a data frame called test. However, whenever I execute the code on the bottom I get the following error: OperationalError: (sqlite3.OperationalError) near "table": syntax error [SQL: 'SELECT COL1, COL6, COL7 FROM table where COL1 = 2001'] (Background on this error at: http://sqlalche.me/e/e3q8) I have also tried selecting all of the columns in the database by using "SELECT* FROM table where COL1 = 2000" in the SQL query. However, it returns the same error.

import pandas as pd
from sqlalchemy import create_engine

file = "/Users/benalbert/Desktop/Econ522/usa_00001.csv"

csv_database = create_engine("sqlite:///csv_database.db")
chunksize = 1000
i = 0
j = 1
for df in pd.read_csv(file, chunksize=chunksize, iterator=True):
      df = df.rename(columns={c: c.replace(' ', '') for c in df.columns}) 
      df.index += j
      i+=1
      df.to_sql('table', csv_database, if_exists='append')
      j = df.index[-1] + 1

test = pd.read_sql_query('SELECT COL1, COL6, COL7 FROM table where COL1 = 
2001', csv_database)

The desired output is a new data frame containing observations from only columns 1, 6, and 7 when column 1 has a value of 2001.

You can change the table name from 'table' to anything else and the code works. This worked for me.

import pandas as pd
from sqlalchemy import create_engine

file = "arandomlargefileIhad.csv"

csv_database = create_engine("sqlite:///csv_database.db")
cnx = csv_database.raw_connection()

chunksize = 1000
i = 0
j = 1
for df in pd.read_csv(file, chunksize=chunksize, iterator=True):
      df.index += j
      i+=1
      df.to_sql('random', csv_database, if_exists='append')
      j = df.index[-1] + 1

sql_statement = "SELECT * FROM random"
test = pd.read_sql_query(sql_statement, csv_database) #this works
test2 = pd.read_sql_query(sql_statement, cnx) #so does this

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM