简体   繁体   English

pgsql查询“ SELECT .. IN”的局限性

[英]limitations of “SELECT .. IN” pgsql query

I have a .csv file with about 14000 objectIDs. 我有一个约有14000个objectID的.csv文件。 My goal is to retrieve certain fields associated with these objectIDs. 我的目标是检索与这些objectID关联的某些字段。 So far what I have done is concatenate all the objectIDs into a comma separated list that I append to the end of this query: 到目前为止,我所做的是将所有objectID连接到一个逗号分隔的列表中,并将其追加到此查询的末尾:

SELECT objectName, objectType FROM objectTable WHERE objectID IN 1001, 1002, 1003... 

This however is very (very) slow as my database has about 16 million unique objectIDs. 但是,这非常慢(非常慢),因为我的数据库有大约1600万个唯一的objectID。 Is there a better way to structure such a query? 有没有更好的方法来构造这样的查询? Must I run this in batches? 我必须分批运行吗? (I tried this too but it was unbearably slow) or is my entire approach wrong? (我也尝试过这种方法,但是速度太慢了),还是我的整个方法都不对?

Load the objectIDs into a table and then join against that. 将objectID加载到表中,然后加入该表。

SELECT objectName, objectType
FROM objectTable INNER JOIN objectids ON (objecttable.objectid = objectids.id)

w/ 16m objectIDs, it's probably just taking a lot longer to upload the query string than to actually run it. 带有16m个objectID,上载查询字符串可能要比实际运行花费更长的时间。

Create a table from your CSV file, with all the objectIDs preloaded. 从CSV文件创建一个表,并预加载所有objectID。 Say you call this table "objectIDs", and the main row is "id". 假设您将此表称为“ objectIDs”,而主行为“ id”。 Now you can say: 现在您可以说:

SELECT objectName, objectType FROM objectTable
INNER JOIN objectIDs ON objectIDs.objectID=objectTable.objectID

The inner join will automatically cull out any unjoined things in objectTable, and will join in a 1:1 relationship with your IDs table. 内部联接将自动剔除objectTable中所有未联接的事物,并将与您的ID表以1:1关系联接。

If you already have a comma-separated string containing all numbers, you could use a prepared statement - with the syntax of whatever client you use. 如果您已经有一个包含所有数字的逗号分隔的字符串,则可以使用一条准备好的语句-使用任何客户端的语法。 Example in plain SQL: 普通SQL中的示例:

PREPARE myplan (text) AS
    SELECT o.objectname, o.objecttype
    FROM   (SELECT unnest(string_to_array($1, ','))::int AS objectid) x
    JOIN   objecttable o USING (objectid);

EXECUTE myplan('1001, 1002, 1003');

Or , if you start from a valid CSV file on the database server , create a temporary table, COPY the data to it ( COPY is very fast), and then JOIN to it. 或者 ,如果您从数据库服务器上的有效CSV文件开始,请创建一个临时表,将数据COPY到该表( COPY非常快),然后将其JOIN到该表。

CREATE TEMP TABLE tmp_x (objectid int);

COPY tmp_x FROM '/path/to/my/file.csv';

    SELECT o.objectname, o.objecttype
    FROM   tmp_x
    JOIN   objecttable o USING (objectid);

DROP TABLE tmp_x;   -- optional; dropped automatically at end of session

If your file is on a different machine, use psql 's meta-command \\copy instead. 如果文件在另一台计算机上,请改用psql的meta命令\\copy

You surely have an index on objecttable.objectid ? 您肯定在objecttable.objectid上有一个索引吗? That's crucial. 这很关键。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM