如何用值列表查詢數據庫？

Question

我擁有全球每個主要機場的經緯度數據庫。 我只需要一個單獨的.csv文件中列出的一部分（特別是在美國）。

這個csv文件有兩列，我從中提取了兩個列表中的數據：始發機場代碼（IATA代碼）和目的地機場代碼（也是IATA）。

我的數據庫有一個用於IATA的列，本質上我試圖查詢該數據庫以獲取我擁有的兩個列表中每個機場的緯度/經度坐標。

這是我的代碼：

import pandas as pd
from sqlalchemy import create_engine

engine = create_engine('sqlite:///airport_coordinates.db')

# The dataframe that contains the IATA codes for the airports I need
airport_relpath = "data/processed/%s_%s_combined.csv" % (file, airline)
script_dir = os.path.dirname(os.getcwd())
temp_file = os.path.join(script_dir, airport_relpath)
fields = ["Origin_Airport_Code", "Destination_Airport_Code"]
df_airports = pd.read_csv(temp_file, usecols=fields)

# the origin/destination IATA codes for the airports I need
origin = df_airports.Origin_Airport_Code.values
dest = df_airports.Destination_Airport_Code.values

# query the database for the lat/long coords of the airports I need
sql = ('SELECT lat, long FROM airportCoords WHERE iata IN %s' %(origin))
indexcols = ['lat', 'long']

df_origin = pd.read_sql(sql, engine)
# testing the origin coordinates
print(df_origin)

這是我得到的錯誤：

sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such 
table: 'JFK' 'JFK' 'JFK' ... 'MIA' 'JFK' 'MIA' [SQL: "SELECT lat, long 
FROM airportCoords WHERE iata IN ['JFK' 'JFK' 'JFK' ... 'MIA' 'JFK' 
'MIA']"] (Background on this error at: http://sqlalche.me/e/e3q8)

絕對是因為我沒有正確查詢它（因為它認為我的查詢應該在表中進行）。

我嘗試遍歷該列表以單獨查詢每個元素，但是該列表包含超過604,885個元素，並且我的計算機無法提供任何輸出。

Answer 1

您的錯誤在於使用字符串插值：

sql = ('SELECT lat, long FROM airportCoords WHERE iata IN %s' %(origin))

由於origin是一個Numpy數組，因此在查詢中會導致[....] SQL標識符語法； 請參閱SQLite文檔：

如果要使用關鍵字作為名稱，則需要用引號引起來。 SQLite中有四種引用關鍵字的方式：

[...]
[keyword]用方括號括起來的關鍵字是一個標識符。 [...]

您要求SQLite檢查iata是否在名為['JFK' 'JFK' 'JFK' ... 'MIA' 'JFK' 'MIA'] iata ['JFK' 'JFK' 'JFK' ... 'MIA' 'JFK' 'MIA'] iata ['JFK' 'JFK' 'JFK' ... 'MIA' 'JFK' 'MIA']因為這是Numpy數組的字符串表示形式。

您已經在使用SQLAlchemy，如果使用該庫為您生成所有SQL（包括IN (....)成員資格測試IN (....) ，則會更加容易：

from sqlalchemy import *

filter = literal_column('iata', String).in_(origin)
sql = select([
    literal_column('lat', Float),
    literal_column('long', Float),
]).select_from(table('airportCoords')).where(filter)

然后將sql作為查詢傳遞。

我在這里使用了literal_column()和table()對象來直接快捷地指向對象的名稱，但是您也可以要求SQLAlchemy直接從已經創建的engine對象中反映出數據庫表，然后使用結果表定義來生成查詢：

metadata = MetaData()
airport_coords = Table('airportCoords', metadata, autoload=True, autoload_with=engine)

此時查詢將被定義為：

filter = airport_coords.c.iata.in_(origin)
sql = select([airport_coords.c.lat, airport_coords.c.long]).where(filter)

我還將在輸出中包含iata代碼，否則您將沒有回到將IATA代碼連接到匹配坐標的路徑：

sql = select([airport_coords.c.lat, airport_coords.c.long, airport_coords.c.iata]).where(filter)

接下來，就像您說的那樣，列表中有604,885個元素，因此您可能希望將該CSV數據加載到臨時表中，以保持查詢的效率：

engine = create_engine('sqlite:///airport_coordinates.db')

# code to read CSV file
# ...
df_airports = pd.read_csv(temp_file, usecols=fields)

# SQLAlchemy table wrangling
metadata = MetaData()
airport_coords = Table('airportCoords', metadata, autoload=True, autoload_with=engine)
temp = Table(
    "airports_temp",
    metadata,
    *(Column(field, String) for field in fields),
    prefixes=['TEMPORARY']
)
with engine.begin() as conn:
    # insert CSV values into a temporary table in SQLite
    temp.create(conn, checkfirst=True)
    df_airports.to_sql(temp.name), engine, if_exists='append')

# Join the airport coords against the temporary table
joined = airport_coords.join(temp, airport_coords.c.iata==temp.c.Origin_Airport_Code)

# select coordinates per airport, include the iata code
sql = select([airport_coords.c.lat, airport_coords.c.long, airport_coords.c.iata]).select_from(joined)
df_origin = pd.read_sql(sql, engine)

如何用值列表查詢數據庫？

問題描述

1 個解決方案

解決方案1
1 已采納 2018-11-13 15:58:42

如何用值列表查詢數據庫？

問題描述

1 個解決方案

解決方案1 1 已采納 2018-11-13 15:58:42

解決方案1
1 已采納 2018-11-13 15:58:42