I have about 100.db files stored on my Google Drive which I want to run the same SQL query on. I'd like to store these query results in a single.csv file.
I've managed to use the following code to write the results of a single SQL query into a.csv file, but I am unable to make it work for multiple files.
conn = sqlite3.connect('/content/drive/My Drive/Data/month_2014_01.db')
df = pd.read_sql_query("SELECT * FROM messages INNER JOIN users ON messages.id = users.id WHERE text LIKE '%house%'", conn)
df.to_csv('/content/drive/My Drive/Data/Query_Results.csv')
This is the code that I have used so far to try and make it work for all files, based on this post .
databases = []
directory = '/content/drive/My Drive/Data/'
for filename in os.listdir(directory):
flname = os.path.join(directory, filename)
databases.append(flname)
for database in databases:
try:
with sqlite3.connect(database) as conn:
conn.text_factory = str
cur = conn.cursor()
cur.execute(row["SELECT * FROM messages INNER JOIN users ON messages.id = users.id WHERE text LIKE '%house%'"])
df.loc[index,'Results'] = cur.fetchall()
except sqlite3.Error as err:
print ("[INFO] %s" % err)
But this throws me an error: TypeError: tuple indices must be integers or slices, not str . I'm obviously doing something wrong and I would much appreciate any tips that would point towards an answer.
Consider building a list of data frames, then concatenate them together in a single data frame with pandas.concat
:
gdrive = "/content/drive/My Drive/Data/"
sql = """SELECT * FROM messages
INNER JOIN users ON messages.id = users.id
WHERE text LIKE '%house%'
"""
def build_df(db)
with sqlite3.connect(os.path.join(gdrive, db)) as conn:
df = pd.read_sql_query(sql, conn)
return df
# BUILD LIST OF DFs WITH LIST COMPREHENSION
df_list = [build_df(db) for db in os.listdir(gdrive) if db.endswith('.db')]
# CONCATENATE ALL DFs INTO SINGLE DF FOR EXPORT
final_df = pd.concat(df_list, ignore_index = True)
final_df.to_csv(os.path.join(gdrive, 'Query_Results.csv'), index = False)
Better yet, consider SQLite's ATTACH DATABASE
and append query results into a master table. This also avoids using the heavy data science, third-party library, pandas
, for simple data migration needs. Plus, you keep all database data inside SQLite without worrying about data type conversion and i/o transfer issues.
import csv
import sqlite3
with sqlite3.connect(os.path.join(gdrive, 'month_2014_01')) as conn:
# CREATE MASTER TABLE
cur = conn.cursor()
cur.execute("DROP TABLE IF EXISTS master_query")
cur.execute("""CREATE TABLE master_query AS
SELECT * FROM tmp.messages
INNER JOIN tmp.users
ON tmp.messages.id = tmp.users.id
WHERE text LIKE '%house%'
""")
conn.commit()
# ITERATIVELY ATTACH AND APPEND RESULTS
for db in os.listdir(gdrive):
if db.endswith('.db'):
cur.execute("ATTACH DATABASE ? AS tmp", [db])
cur.execute("""INSERT INTO master_query
SELECT * FROM tmp.messages
INNER JOIN tmp.users
ON tmp.messages.id = tmp.users.id
WHERE text LIKE '%house%'
""")
cur.execute("DETACH DATABASE tmp")
conn.commit()
# WRITE TUPLE OF ROWS TO CSV
data = cur.execute("SELECT * FROM master_query")
with open(os.path.join(gdrive, 'Query_Results.csv'), 'wb') as f:
writer = csv.writer(f)
writer.writerow([i[0] for i in cur.description]) # HEADERS
writer.writerows(data) # DATA
cur.close()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.