简体   繁体   English

如何对多个.db文件进行SQL查询并将结果存储在a.csv中?

[英]How to conduct SQL queries on multiple .db files and store the results in a .csv?

I have about 100.db files stored on my Google Drive which I want to run the same SQL query on.我的 Google Drive 上存储了大约 100.db 文件,我想在这些文件上运行相同的 SQL 查询。 I'd like to store these query results in a single.csv file.我想将这些查询结果存储在单个.csv 文件中。

I've managed to use the following code to write the results of a single SQL query into a.csv file, but I am unable to make it work for multiple files.我设法使用以下代码将单个 SQL 查询的结果写入 a.csv 文件,但我无法使其适用于多个文件。

conn = sqlite3.connect('/content/drive/My Drive/Data/month_2014_01.db')

df = pd.read_sql_query("SELECT * FROM messages INNER JOIN users ON messages.id = users.id WHERE text LIKE '%house%'", conn)

df.to_csv('/content/drive/My Drive/Data/Query_Results.csv')

This is the code that I have used so far to try and make it work for all files, based on this post .根据这篇文章,这是我迄今为止用来尝试使其适用于所有文件的代码。

databases = []

directory = '/content/drive/My Drive/Data/'
for filename in os.listdir(directory):
    flname = os.path.join(directory, filename)
    databases.append(flname)

for database in databases:
    try:
        with sqlite3.connect(database) as conn:

            conn.text_factory = str
            cur = conn.cursor()
            cur.execute(row["SELECT * FROM messages INNER JOIN users ON messages.id = users.id WHERE text LIKE '%house%'"])
            df.loc[index,'Results'] = cur.fetchall()

    except sqlite3.Error as err:
        print ("[INFO] %s" % err)

But this throws me an error: TypeError: tuple indices must be integers or slices, not str .但这会给我一个错误: TypeError: tuple indices must be integers or slices, not str I'm obviously doing something wrong and I would much appreciate any tips that would point towards an answer.我显然做错了什么,我将非常感谢任何指向答案的提示。

Consider building a list of data frames, then concatenate them together in a single data frame with pandas.concat :考虑构建一个数据帧列表,然后使用pandas.concat将它们连接到一个数据帧中:

gdrive = "/content/drive/My Drive/Data/"
sql = """SELECT * FROM messages 
          INNER JOIN users ON messages.id = users.id 
          WHERE text LIKE '%house%'
      """

def build_df(db)
    with sqlite3.connect(os.path.join(gdrive, db)) as conn:
         df = pd.read_sql_query(sql, conn) 

    return df

# BUILD LIST OF DFs WITH LIST COMPREHENSION
df_list = [build_df(db) for db in os.listdir(gdrive) if db.endswith('.db')]

# CONCATENATE ALL DFs INTO SINGLE DF FOR EXPORT
final_df = pd.concat(df_list, ignore_index = True)

final_df.to_csv(os.path.join(gdrive, 'Query_Results.csv'), index = False)

Better yet, consider SQLite's ATTACH DATABASE and append query results into a master table.更好的是,将 SQLite 的ATTACH DATABASE和 append 查询结果考虑到主表中。 This also avoids using the heavy data science, third-party library, pandas , for simple data migration needs.这也避免了使用繁重的数据科学第三方库pandas来满足简单的数据迁移需求。 Plus, you keep all database data inside SQLite without worrying about data type conversion and i/o transfer issues.此外,您可以将所有数据库数据保存在 SQLite 中,而无需担心数据类型转换和 i/o 传输问题。

import csv
import sqlite3

with sqlite3.connect(os.path.join(gdrive, 'month_2014_01')) as conn:
     # CREATE MASTER TABLE
     cur = conn.cursor()
     cur.execute("DROP TABLE IF EXISTS master_query")
     cur.execute("""CREATE TABLE master_query AS
                    SELECT * FROM tmp.messages 
                    INNER JOIN tmp.users 
                        ON tmp.messages.id = tmp.users.id 
                    WHERE text LIKE '%house%'
                 """)
     conn.commit()

     # ITERATIVELY ATTACH AND APPEND RESULTS
     for db in os.listdir(gdrive):
         if db.endswith('.db'):
             cur.execute("ATTACH DATABASE ? AS tmp", [db])
             cur.execute("""INSERT INTO master_query
                            SELECT * FROM tmp.messages 
                            INNER JOIN tmp.users 
                                ON tmp.messages.id = tmp.users.id 
                            WHERE text LIKE '%house%'
                         """)
             cur.execute("DETACH DATABASE tmp")
             conn.commit()

     # WRITE TUPLE OF ROWS TO CSV
     data = cur.execute("SELECT * FROM master_query")

     with open(os.path.join(gdrive, 'Query_Results.csv'), 'wb') as f: 
         writer = csv.writer(f) 
         writer.writerow([i[0] for i in cur.description])  # HEADERS
         writer.writerows(data)                            # DATA

     cur.close()

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在for循环中将结果写入多个CSV文件 - how to write results in a for loop to multiple CSV files 如何在python或R中以编程方式将多个db文件转换为csv? - How to programmatically convert multiple db files to csv in python or R? 如何在多个文件之间插入缺失的行并进行数学运算? - How do I insert missing rows and conduct math between multiple files? Pandas - 试图将多个 .txt 文件存储在 a.csv 中 - Pandas - Trying to store multiple .txt files in a .csv 获取多个.csv文件中多个查询的信息 - Getting information for multiple queries across multiple .csv files 将多个SQL查询的结果附加到Pandas Dataframe或Dictionary中 - Append results from multiple SQL Queries into a Pandas Dataframe or Dictionary 如何使用 Plotly python 库在折线图上获取多个 csv 文件以在浏览器上获取结果? - how to get the multiple csv files on a line graph using Plotly python library to get the results on browser? 如何允许用户根据用户输入在程序中存储多个 CSV 文件? - Python - How can I allow the user to store multiple CSV files in a program from user input? - Python 如何使用 python 中的 Plotly 库或任何可以在浏览器上获得结果的模块将多个 csv 文件放到图形上? - how to get the multiple csv files on to the graph using Plotly library in python or any module that can get the results on browser? 如何使用 Python 读取多个 CSV 文件、存储数据并在一个图中绘制 - How to read multiple CSV files, store data and plot in one figure, using Python
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM