I have a row named filename
in postgresql table named xml_joblist
which contain many files, and a list named files_name
in python which contain several files that are sorted after some process. I want to compare filename
& files_name
and check whether there are any files matching.
|filename |
|---------|
|file_111 |
|file_555 |
|file_888 |
|file_333 |
|file_445 |
| . |
| . |
| goes-on |
the above given is the filename
row in postgresql table
files_name = [file_789, file_456, file_555, file_111]
the above given is the files_name list i python
How can i write a sql statement to do this process in python?
Expected result:
matchin_files = [file_555, file_111]
To compare filesnames
in PostgreSQL with the files_name
list in Python and find the matching files, you can use the IN operator in a SELECT statement.
Here is an example of how you can write a SQL statement to do this in Python using the psycopg2 library:
import psycopg2
# Connect to the database
conn = psycopg2.connect("dbname=mydatabase user=myuser password=mypassword")
# Create a cursor
cur = conn.cursor()
# Define the list of files
files_name = ["file_789", "file_456", "file_555", "file_111"]
# Build the SELECT statement
sql = "SELECT filename FROM xml_joblist WHERE filename IN %s"
# Execute the SELECT statement
cur.execute(sql, (tuple(files_name),))
# Fetch the matching files
matching_files = cur.fetchall()
# Print the matching files
print(matching_files)
# Close the cursor and connection
cur.close()
conn.close()
This will execute a SELECT statement that retrieves the filename
column where the filename is in the files_name
list. The resulting rows will be stored in the matching_files variable, which will contain the list of matching files.
[("file_555",), ("file_111",)] # output
Note that this returns list of tuples
. You can use list comprehension to covert it to proper list
matching_files = [f[0] for f in matching_files]
# Returns: ["file_555", "file_111"]
Connect to the postgres db
from sqlalchemy import create_engine, select, Table, Column, String, MetaData, Float, Integer
from sqlalchemy.engine import URL
# Creating URL object
url_object = URL.create(
"postgresql",
username="YourUserName",
password="YourPassword", # plain (unescaped) text
host="YourHostName",
database="YourDBName",
)
# Define the Connection Object
db = create_engine(url_object)
Get the xml_joblist
table
# Create the Metadata Object
meta = MetaData(bind=db)
MetaData.reflect(meta)
# Get the `xml_joblist` table from the Metadata object
xml_joblist = meta.tables['xml_joblist']
Select filename
from xml_joblist
table and compare it to files_name
# List with File names
files_name = ['file_789', 'file_456', 'file_555', 'file_111']
# Read
with db.connect() as conn:
#SELECT xml_joblist.filename FROM xml_joblist WHERE xml_joblist.filename IN (files_name)
select_statement = select(xml_joblist.c.filename).where(xml_joblist.c.filename.in_(files_name))
result_set = conn.execute(select_statement)
matchin_files = result_set.scalars().all()
This gives matchin_files
as a list like this-
['file_111', 'file_555']
import psycopg2
con = psycopg2.connect(
host="localhost",
port="5432",
database="postgres",
user="postgres",
password="postgres",
)
cur = con.cursor()
cur.execute("CREATE TABLE xml_joblist (filename VARCHAR(10))")
cur.executemany(
"INSERT INTO xml_joblist (filename) VALUES (%s)",
[(f"file_10{n}",) for n in range(10)],
)
files_name = ["file_103", "file_107", "file_108"] # keep as list or assign tuple directly
cur.execute(
"SELECT filename FROM xml_joblist WHERE filename IN %s",
(tuple(files_name),)
) # convert files_name to tuple if not already a tuple
results = cur.fetchall() # [('file_103',), ('file_107',), ('file_108',)]
matching_files = [t[0] for t in results] # ['file_103', 'file_107', 'file_108'
con.close()
The code provided connects to a PostgreSQL database, creates a table called xml_joblist with a single column called filename, inserts 10 rows into the table, then queries the table for rows with filename values specified in the files_name list. The query returns a list of tuples, each containing a single filename value, which are then extracted and stored in a new list called matching_files. Finally, the code closes the cursor and connection to the database.
If you're using psycopg2 directly, you'll need to build your SELECT IN
query for N parameters and then execute it with file_names
. You then retrieve the file name directly with a list comprehension.
See the following demo a table that contains file_100
through to file_109
.
# setup
import psycopg2
con = psycopg2.connect(
host="localhost",
port="5432",
database="postgres",
user="postgres",
password="postgres",
)
cur = con.cursor()
cur.execute("CREATE TABLE xml_joblist (filename VARCHAR(10))")
cur.executemany(
"INSERT INTO xml_joblist (filename) VALUES (%s)",
[(f"file_10{n}",) for n in range(10)],
)
# query
files_name = ["file_103", "file_107", "file_108"]
values_placeholders = ",".join("%s" for _ in range(len(files_name)))
sql_query = f"SELECT filename FROM xml_joblist WHERE filename IN ({values_placeholders})"
cur.execute(sql_query, files_name)
results = cur.fetchall() # [('file_103',), ('file_107',), ('file_108',)]
matching_files = [t[0] for t in results] # ['file_103', 'file_107', 'file_108'
# closing
con.close()
If you use anything else (like SQLAlchemy), please leave a comment.
Set up table:
create table xml_joblist(filename varchar);
insert into xml_joblist values ('file_111'), ('file_555'), ('file_888'), ('file_333'), ('file_445');
Python code:
import psycopg2
con = psycopg2.connect(dbname="test", host='localhost', user='postgres', port=5432)
cur = con.cursor()
files_name = ["file_789", "file_456", "file_555", "file_111"]
cur.execute("select filename from xml_joblist where filename = ANY(%s)", [files_name])
matching_files = [row[0] for row in cur.fetchall()]
matching_files
['file_111', 'file_555']
This uses psycopg2
list adaptation :
Python lists are converted into PostgreSQL ARRAYs:
cur.mogrify("SELECT %s;", ([10, 20, 30], )) 'SELECT ARRAY[10,20,30];'
Note
You can use a Python list as the argument of the IN operator using the PostgreSQL ANY operator.
ids = [10, 20, 30] cur.execute("SELECT * FROM data WHERE id = ANY(%s);", (ids,))
Furthermore ANY can also work with empty lists, whereas IN () is a SQL syntax error.
Note
...
to select those file names that are in the file_names
list that are also in the xml_joblist
table using the ANY operator:
9.24.3. ANY/SOME (array)
expression operator ANY (array expression)
expression operator SOME (array expression)
The right-hand side is a parenthesized expression, which must yield an array value. The left-hand expression is evaluated and compared to each element of the array using the given operator, which must yield a Boolean result. The result of ANY is “true” if any true result is obtained. The result is “false” if no true result is found (including the case where the array has zero elements).
If the array expression yields a null array, the result of ANY will be null. If the left-hand expression yields null, the result of ANY is ordinarily null (though a non-strict comparison operator could possibly yield a different result). Also, if the right-hand array contains any null elements and no true comparison result is obtained, the result of ANY will be null, not false (again, assuming a strict comparison operator). This is in accordance with SQL's normal rules for Boolean combinations of null values.
SOME is a synonym for ANY.
To compare the values in the filename
column of the xml_joblist
table with the values in the files_name
list in Python and find the matching values, you can use the following SQL statement:
SELECT filename FROM xml_joblist WHERE filename IN (%s)
then use Python's str.join()
method to join the files_name list into a string with the values separated by commas:
import psycopg2
# Connect to the database
conn = psycopg2.connect(dbname="mydatabase", user="myuser", password="mypassword", host="localhost")
# Create a cursor
cur = conn.cursor()
# Build the parameter string
param_string = ",".join(["'" + file + "'" for file in files_name])
# Execute the SQL statement
cur.execute("SELECT filename FROM xml_joblist WHERE filename IN (%s)" % param_string)
# Fetch the results
matching_files = [row[0] for row in cur.fetchall()]
# Close the cursor and connection
cur.close()
conn.close()
This code will execute the SQL statement, fetch the matching values from the filename column, and store them in the matching_files
list.
If you are using a library such as psycopg2, then the cur.execute method() will return you a list of tuples , with each tuple containing a single item.
So essentially your question simplifies to How to compare a list of tuples against a list.
One of the ways to do so is to use something like below
list_of_tuples = [(i,) for i in range(1000000)]
my_list = list(range(1000000))
if list_of_tuples == [(elem,) for elem in my_list]:
print("The lists are equal")
Other way is to use set.
list_of_tuples = [(i,) for i in range(1000000)]
my_list = list(range(1000000))
if set(list_of_tuples) == set(my_list):
print("The lists are equal")
you can use SQLAlchemy IN comparison operator( check here for more inforamtion! )
stmt = select(xml_joblist.filename).where(xml_joblist.filename.in_(files_name))
result = conn.execute(stmt)
SELECT datname FROM pg_database
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.