简体   繁体   中英

How to check if a list in python and row in PostgreSQL table contain same files?

I have a row named filename in postgresql table named xml_joblist which contain many files, and a list named files_name in python which contain several files that are sorted after some process. I want to compare filename & files_name and check whether there are any files matching.

|filename |
|---------|
|file_111 |
|file_555 |
|file_888 |
|file_333 | 
|file_445 |
|   .     |
|   .     |
| goes-on |

the above given is the filename row in postgresql table

files_name = [file_789, file_456, file_555, file_111]

the above given is the files_name list i python

How can i write a sql statement to do this process in python?

Expected result:

matchin_files = [file_555, file_111]

To compare filesnames in PostgreSQL with the files_name list in Python and find the matching files, you can use the IN operator in a SELECT statement.

Here is an example of how you can write a SQL statement to do this in Python using the psycopg2 library:

import psycopg2

# Connect to the database
conn = psycopg2.connect("dbname=mydatabase user=myuser password=mypassword")

# Create a cursor
cur = conn.cursor()

# Define the list of files
files_name = ["file_789", "file_456", "file_555", "file_111"]

# Build the SELECT statement
sql = "SELECT filename FROM xml_joblist WHERE filename IN %s"

# Execute the SELECT statement
cur.execute(sql, (tuple(files_name),))

# Fetch the matching files
matching_files = cur.fetchall()

# Print the matching files
print(matching_files)

# Close the cursor and connection
cur.close()
conn.close()

This will execute a SELECT statement that retrieves the filename column where the filename is in the files_name list. The resulting rows will be stored in the matching_files variable, which will contain the list of matching files.

[("file_555",), ("file_111",)] # output

Note that this returns list of tuples . You can use list comprehension to covert it to proper list

matching_files = [f[0] for f in matching_files]

# Returns: ["file_555", "file_111"]

Connect to the postgres db

from sqlalchemy import create_engine, select, Table, Column, String, MetaData,  Float, Integer
from sqlalchemy.engine import URL

# Creating URL object

url_object  = URL.create(
    "postgresql",
    username="YourUserName",
    password="YourPassword",  # plain (unescaped) text
    host="YourHostName",
    database="YourDBName",
)

# Define the Connection Object
db = create_engine(url_object)

Get the xml_joblist table

# Create the Metadata Object
meta = MetaData(bind=db)
MetaData.reflect(meta)

# Get the `xml_joblist` table from the Metadata object
xml_joblist = meta.tables['xml_joblist']

Select filename from xml_joblist table and compare it to files_name

# List with File names
files_name = ['file_789', 'file_456', 'file_555', 'file_111']

# Read
with db.connect() as conn:
    #SELECT xml_joblist.filename FROM xml_joblist WHERE xml_joblist.filename IN (files_name)
    select_statement = select(xml_joblist.c.filename).where(xml_joblist.c.filename.in_(files_name))
    result_set = conn.execute(select_statement)
    matchin_files = result_set.scalars().all()

This gives matchin_files as a list like this-

['file_111', 'file_555']
import psycopg2

con = psycopg2.connect(
    host="localhost",
    port="5432",
    database="postgres",
    user="postgres",
    password="postgres",
)

cur = con.cursor()
cur.execute("CREATE TABLE xml_joblist (filename VARCHAR(10))")
cur.executemany(
    "INSERT INTO xml_joblist (filename) VALUES (%s)",
    [(f"file_10{n}",) for n in range(10)],
)

files_name = ["file_103", "file_107", "file_108"]  # keep as list or assign tuple directly

cur.execute(
    "SELECT filename FROM xml_joblist WHERE filename IN %s",
    (tuple(files_name),)
)  # convert files_name to tuple if not already a tuple
results = cur.fetchall()  # [('file_103',), ('file_107',), ('file_108',)]
matching_files = [t[0] for t in results]  # ['file_103', 'file_107', 'file_108'

con.close()

The code provided connects to a PostgreSQL database, creates a table called xml_joblist with a single column called filename, inserts 10 rows into the table, then queries the table for rows with filename values specified in the files_name list. The query returns a list of tuples, each containing a single filename value, which are then extracted and stored in a new list called matching_files. Finally, the code closes the cursor and connection to the database.

If you're using psycopg2 directly, you'll need to build your SELECT IN query for N parameters and then execute it with file_names . You then retrieve the file name directly with a list comprehension.

See the following demo a table that contains file_100 through to file_109 .

# setup
import psycopg2

con = psycopg2.connect(
    host="localhost",
    port="5432",
    database="postgres",
    user="postgres",
    password="postgres",
)

cur = con.cursor()
cur.execute("CREATE TABLE xml_joblist (filename VARCHAR(10))")
cur.executemany(
    "INSERT INTO xml_joblist (filename) VALUES (%s)",
    [(f"file_10{n}",) for n in range(10)],
)

# query
files_name = ["file_103", "file_107", "file_108"]

values_placeholders = ",".join("%s" for _ in range(len(files_name)))
sql_query = f"SELECT filename FROM xml_joblist WHERE filename IN ({values_placeholders})"

cur.execute(sql_query, files_name)
results = cur.fetchall()  # [('file_103',), ('file_107',), ('file_108',)]
matching_files = [t[0] for t in results]  # ['file_103', 'file_107', 'file_108'

# closing
con.close()

If you use anything else (like SQLAlchemy), please leave a comment.

Set up table:

create table xml_joblist(filename varchar);
insert into xml_joblist values ('file_111'), ('file_555'), ('file_888'), ('file_333'), ('file_445');

Python code:

import psycopg2
con = psycopg2.connect(dbname="test", host='localhost', user='postgres', port=5432)
cur = con.cursor()

files_name = ["file_789", "file_456", "file_555", "file_111"]

cur.execute("select filename from xml_joblist where filename = ANY(%s)", [files_name])

matching_files = [row[0] for row in cur.fetchall()]

matching_files
['file_111', 'file_555']

This uses psycopg2 list adaptation :

Python lists are converted into PostgreSQL ARRAYs:

cur.mogrify("SELECT %s;", ([10, 20, 30], )) 'SELECT ARRAY[10,20,30];'

Note

You can use a Python list as the argument of the IN operator using the PostgreSQL ANY operator.

ids = [10, 20, 30] cur.execute("SELECT * FROM data WHERE id = ANY(%s);", (ids,))

Furthermore ANY can also work with empty lists, whereas IN () is a SQL syntax error.

Note

...

to select those file names that are in the file_names list that are also in the xml_joblist table using the ANY operator:

9.24.3. ANY/SOME (array)

expression operator ANY (array expression)

expression operator SOME (array expression)

The right-hand side is a parenthesized expression, which must yield an array value. The left-hand expression is evaluated and compared to each element of the array using the given operator, which must yield a Boolean result. The result of ANY is “true” if any true result is obtained. The result is “false” if no true result is found (including the case where the array has zero elements).

If the array expression yields a null array, the result of ANY will be null. If the left-hand expression yields null, the result of ANY is ordinarily null (though a non-strict comparison operator could possibly yield a different result). Also, if the right-hand array contains any null elements and no true comparison result is obtained, the result of ANY will be null, not false (again, assuming a strict comparison operator). This is in accordance with SQL's normal rules for Boolean combinations of null values.

SOME is a synonym for ANY.

To compare the values in the filename column of the xml_joblist table with the values in the files_name list in Python and find the matching values, you can use the following SQL statement:

SELECT filename FROM xml_joblist WHERE filename IN (%s)

then use Python's str.join() method to join the files_name list into a string with the values separated by commas:

import psycopg2

# Connect to the database
conn = psycopg2.connect(dbname="mydatabase", user="myuser", password="mypassword", host="localhost")

# Create a cursor
cur = conn.cursor()

# Build the parameter string
param_string = ",".join(["'" + file + "'" for file in files_name])

# Execute the SQL statement
cur.execute("SELECT filename FROM xml_joblist WHERE filename IN (%s)" % param_string)

# Fetch the results
matching_files = [row[0] for row in cur.fetchall()]

# Close the cursor and connection
cur.close()
conn.close()

This code will execute the SQL statement, fetch the matching values from the filename column, and store them in the matching_files list.

If you are using a library such as psycopg2, then the cur.execute method() will return you a list of tuples , with each tuple containing a single item.

So essentially your question simplifies to How to compare a list of tuples against a list.

One of the ways to do so is to use something like below

list_of_tuples = [(i,) for i in range(1000000)]
my_list = list(range(1000000))

if list_of_tuples == [(elem,) for elem in my_list]:
    print("The lists are equal")

Other way is to use set.

list_of_tuples = [(i,) for i in range(1000000)]
my_list = list(range(1000000))

if set(list_of_tuples) == set(my_list):
    print("The lists are equal")

you can use SQLAlchemy IN comparison operator( check here for more inforamtion! )

stmt = select(xml_joblist.filename).where(xml_joblist.filename.in_(files_name))
result = conn.execute(stmt)

SELECT datname FROM pg_database

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM