[英]limitations of “SELECT .. IN” pgsql query
I have a .csv file with about 14000 objectIDs. 我有一个约有14000个objectID的.csv文件。 My goal is to retrieve certain fields associated with these objectIDs.
我的目标是检索与这些objectID关联的某些字段。 So far what I have done is concatenate all the objectIDs into a comma separated list that I append to the end of this query:
到目前为止,我所做的是将所有objectID连接到一个逗号分隔的列表中,并将其追加到此查询的末尾:
SELECT objectName, objectType FROM objectTable WHERE objectID IN 1001, 1002, 1003...
This however is very (very) slow as my database has about 16 million unique objectIDs. 但是,这非常慢(非常慢),因为我的数据库有大约1600万个唯一的objectID。 Is there a better way to structure such a query?
有没有更好的方法来构造这样的查询? Must I run this in batches?
我必须分批运行吗? (I tried this too but it was unbearably slow) or is my entire approach wrong?
(我也尝试过这种方法,但是速度太慢了),还是我的整个方法都不对?
Load the objectIDs into a table and then join against that. 将objectID加载到表中,然后加入该表。
SELECT objectName, objectType
FROM objectTable INNER JOIN objectids ON (objecttable.objectid = objectids.id)
w/ 16m objectIDs, it's probably just taking a lot longer to upload the query string than to actually run it. 带有16m个objectID,上载查询字符串可能要比实际运行花费更长的时间。
Create a table from your CSV file, with all the objectIDs preloaded. 从CSV文件创建一个表,并预加载所有objectID。 Say you call this table "objectIDs", and the main row is "id".
假设您将此表称为“ objectIDs”,而主行为“ id”。 Now you can say:
现在您可以说:
SELECT objectName, objectType FROM objectTable
INNER JOIN objectIDs ON objectIDs.objectID=objectTable.objectID
The inner join will automatically cull out any unjoined things in objectTable, and will join in a 1:1 relationship with your IDs table. 内部联接将自动剔除objectTable中所有未联接的事物,并将与您的ID表以1:1关系联接。
If you already have a comma-separated string containing all numbers, you could use a prepared statement - with the syntax of whatever client you use. 如果您已经有一个包含所有数字的逗号分隔的字符串,则可以使用一条准备好的语句-使用任何客户端的语法。 Example in plain SQL:
普通SQL中的示例:
PREPARE myplan (text) AS
SELECT o.objectname, o.objecttype
FROM (SELECT unnest(string_to_array($1, ','))::int AS objectid) x
JOIN objecttable o USING (objectid);
EXECUTE myplan('1001, 1002, 1003');
Or , if you start from a valid CSV file on the database server , create a temporary table, COPY
the data to it ( COPY
is very fast), and then JOIN
to it. 或者 ,如果您从数据库服务器上的有效CSV文件开始,请创建一个临时表,将数据
COPY
到该表( COPY
非常快),然后将其JOIN
到该表。
CREATE TEMP TABLE tmp_x (objectid int);
COPY tmp_x FROM '/path/to/my/file.csv';
SELECT o.objectname, o.objecttype
FROM tmp_x
JOIN objecttable o USING (objectid);
DROP TABLE tmp_x; -- optional; dropped automatically at end of session
If your file is on a different machine, use psql
's meta-command \\copy
instead. 如果文件在另一台计算机上,请改用
psql
的meta命令\\copy
。
You surely have an index on objecttable.objectid
? 您肯定在
objecttable.objectid
上有一个索引吗? That's crucial. 这很关键。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.