pgsql查询“ SELECT .. IN”的局限性

Question

I have a .csv file with about 14000 objectIDs. 我有一个约有14000个objectID的.csv文件。 My goal is to retrieve certain fields associated with these objectIDs. 我的目标是检索与这些objectID关联的某些字段。 So far what I have done is concatenate all the objectIDs into a comma separated list that I append to the end of this query: 到目前为止，我所做的是将所有objectID连接到一个逗号分隔的列表中，并将其追加到此查询的末尾：

SELECT objectName, objectType FROM objectTable WHERE objectID IN 1001, 1002, 1003...

This however is very (very) slow as my database has about 16 million unique objectIDs. 但是，这非常慢（非常慢），因为我的数据库有大约1600万个唯一的objectID。 Is there a better way to structure such a query? 有没有更好的方法来构造这样的查询？ Must I run this in batches? 我必须分批运行吗？ (I tried this too but it was unbearably slow) or is my entire approach wrong? （我也尝试过这种方法，但是速度太慢了），还是我的整个方法都不对？

Answer 1

Load the objectIDs into a table and then join against that. 将objectID加载到表中，然后加入该表。

SELECT objectName, objectType
FROM objectTable INNER JOIN objectids ON (objecttable.objectid = objectids.id)

Answer 2

w/ 16m objectIDs, it's probably just taking a lot longer to upload the query string than to actually run it. 带有16m个objectID，上载查询字符串可能要比实际运行花费更长的时间。

Create a table from your CSV file, with all the objectIDs preloaded. 从CSV文件创建一个表，并预加载所有objectID。 Say you call this table "objectIDs", and the main row is "id". 假设您将此表称为“ objectIDs”，而主行为“ id”。 Now you can say: 现在您可以说：

SELECT objectName, objectType FROM objectTable
INNER JOIN objectIDs ON objectIDs.objectID=objectTable.objectID

The inner join will automatically cull out any unjoined things in objectTable, and will join in a 1:1 relationship with your IDs table. 内部联接将自动剔除objectTable中所有未联接的事物，并将与您的ID表以1：1关系联接。

Answer 3

If you already have a comma-separated string containing all numbers, you could use a prepared statement - with the syntax of whatever client you use. 如果您已经有一个包含所有数字的逗号分隔的字符串，则可以使用一条准备好的语句-使用任何客户端的语法。 Example in plain SQL: 普通SQL中的示例：

PREPARE myplan (text) AS
    SELECT o.objectname, o.objecttype
    FROM   (SELECT unnest(string_to_array($1, ','))::int AS objectid) x
    JOIN   objecttable o USING (objectid);

EXECUTE myplan('1001, 1002, 1003');

Or , if you start from a valid CSV file on the database server , create a temporary table, COPY the data to it ( COPY is very fast), and then JOIN to it. 或者，如果您从数据库服务器上的有效CSV文件开始，请创建一个临时表，将数据COPY到该表（ COPY非常快），然后将其JOIN到该表。

CREATE TEMP TABLE tmp_x (objectid int);

COPY tmp_x FROM '/path/to/my/file.csv';

    SELECT o.objectname, o.objecttype
    FROM   tmp_x
    JOIN   objecttable o USING (objectid);

DROP TABLE tmp_x;   -- optional; dropped automatically at end of session

If your file is on a different machine, use psql 's meta-command \\copy instead. 如果文件在另一台计算机上，请改用psql的meta命令\\copy 。

You surely have an index on objecttable.objectid ? 您肯定在objecttable.objectid上有一个索引吗？ That's crucial. 这很关键。

pgsql查询“ SELECT .. IN”的局限性

问题描述

3 个解决方案

解决方案1
2 已采纳 2012-09-27 00:59:53

解决方案2
1 2012-09-27 01:00:00

解决方案3
1 2012-09-27 01:31:00

pgsql查询“ SELECT .. IN”的局限性

问题描述

3 个解决方案

解决方案1 2 已采纳 2012-09-27 00:59:53

解决方案2 1 2012-09-27 01:00:00

解决方案3 1 2012-09-27 01:31:00

解决方案1
2 已采纳 2012-09-27 00:59:53

解决方案2
1 2012-09-27 01:00:00

解决方案3
1 2012-09-27 01:31:00