如何对多个.db文件进行SQL查询并将结果存储在a.csv中？

Question

I have about 100.db files stored on my Google Drive which I want to run the same SQL query on.我的 Google Drive 上存储了大约 100.db 文件，我想在这些文件上运行相同的 SQL 查询。 I'd like to store these query results in a single.csv file.我想将这些查询结果存储在单个.csv 文件中。

I've managed to use the following code to write the results of a single SQL query into a.csv file, but I am unable to make it work for multiple files.我设法使用以下代码将单个 SQL 查询的结果写入 a.csv 文件，但我无法使其适用于多个文件。

conn = sqlite3.connect('/content/drive/My Drive/Data/month_2014_01.db')

df = pd.read_sql_query("SELECT * FROM messages INNER JOIN users ON messages.id = users.id WHERE text LIKE '%house%'", conn)

df.to_csv('/content/drive/My Drive/Data/Query_Results.csv')

This is the code that I have used so far to try and make it work for all files, based on this post .根据这篇文章，这是我迄今为止用来尝试使其适用于所有文件的代码。

databases = []

directory = '/content/drive/My Drive/Data/'
for filename in os.listdir(directory):
    flname = os.path.join(directory, filename)
    databases.append(flname)

for database in databases:
    try:
        with sqlite3.connect(database) as conn:

            conn.text_factory = str
            cur = conn.cursor()
            cur.execute(row["SELECT * FROM messages INNER JOIN users ON messages.id = users.id WHERE text LIKE '%house%'"])
            df.loc[index,'Results'] = cur.fetchall()

    except sqlite3.Error as err:
        print ("[INFO] %s" % err)

But this throws me an error: TypeError: tuple indices must be integers or slices, not str .但这会给我一个错误： TypeError: tuple indices must be integers or slices, not str 。 I'm obviously doing something wrong and I would much appreciate any tips that would point towards an answer.我显然做错了什么，我将非常感谢任何指向答案的提示。

Answer 1

Consider building a list of data frames, then concatenate them together in a single data frame with pandas.concat :考虑构建一个数据帧列表，然后使用pandas.concat将它们连接到一个数据帧中：

gdrive = "/content/drive/My Drive/Data/"
sql = """SELECT * FROM messages 
          INNER JOIN users ON messages.id = users.id 
          WHERE text LIKE '%house%'
      """

def build_df(db)
    with sqlite3.connect(os.path.join(gdrive, db)) as conn:
         df = pd.read_sql_query(sql, conn) 

    return df

# BUILD LIST OF DFs WITH LIST COMPREHENSION
df_list = [build_df(db) for db in os.listdir(gdrive) if db.endswith('.db')]

# CONCATENATE ALL DFs INTO SINGLE DF FOR EXPORT
final_df = pd.concat(df_list, ignore_index = True)

final_df.to_csv(os.path.join(gdrive, 'Query_Results.csv'), index = False)

Better yet, consider SQLite's ATTACH DATABASE and append query results into a master table.更好的是，将 SQLite 的ATTACH DATABASE和 append 查询结果考虑到主表中。 This also avoids using the heavy data science, third-party library, pandas , for simple data migration needs.这也避免了使用繁重的数据科学第三方库pandas来满足简单的数据迁移需求。 Plus, you keep all database data inside SQLite without worrying about data type conversion and i/o transfer issues.此外，您可以将所有数据库数据保存在 SQLite 中，而无需担心数据类型转换和 i/o 传输问题。

import csv
import sqlite3

with sqlite3.connect(os.path.join(gdrive, 'month_2014_01')) as conn:
     # CREATE MASTER TABLE
     cur = conn.cursor()
     cur.execute("DROP TABLE IF EXISTS master_query")
     cur.execute("""CREATE TABLE master_query AS
                    SELECT * FROM tmp.messages 
                    INNER JOIN tmp.users 
                        ON tmp.messages.id = tmp.users.id 
                    WHERE text LIKE '%house%'
                 """)
     conn.commit()

     # ITERATIVELY ATTACH AND APPEND RESULTS
     for db in os.listdir(gdrive):
         if db.endswith('.db'):
             cur.execute("ATTACH DATABASE ? AS tmp", [db])
             cur.execute("""INSERT INTO master_query
                            SELECT * FROM tmp.messages 
                            INNER JOIN tmp.users 
                                ON tmp.messages.id = tmp.users.id 
                            WHERE text LIKE '%house%'
                         """)
             cur.execute("DETACH DATABASE tmp")
             conn.commit()

     # WRITE TUPLE OF ROWS TO CSV
     data = cur.execute("SELECT * FROM master_query")

     with open(os.path.join(gdrive, 'Query_Results.csv'), 'wb') as f: 
         writer = csv.writer(f) 
         writer.writerow([i[0] for i in cur.description])  # HEADERS
         writer.writerows(data)                            # DATA

     cur.close()

如何对多个.db文件进行SQL查询并将结果存储在a.csv中？

问题描述

1 个解决方案

解决方案1
0 2020-04-25 19:56:18

如何对多个.db文件进行SQL查询并将结果存储在a.csv中？

问题描述

1 个解决方案

解决方案1 0 2020-04-25 19:56:18

解决方案1
0 2020-04-25 19:56:18