How do make faster query from pandas to postgresql

Question

I have a CSV file and I have to search if which rows are in the database. for example, from my CSV I have to use name, surname, and birthdate to find the university name in DB. For example:

from this image example, I should find XXX YYY study in university 1, AAA BBB in university 2, and no result for TTT YYY.

My solution is following which is very slow. CSV file has a 50k line and DB 40M.

I use python pandas, and read CSV files, then I create a new column combine of the name, surname, and birthdate. example data from the new combine column: "XXX+YYYY+29-05-1953"

Then I get a list of all possible data from the new combine column. Lets say list is: combine_list = data[new_column].tolist()

And now my amazing query:))

query = Select concat(name ,'+',surname,'+',birthdate) as new_column, university
        from db_table where name is not NULL and surname is not NULL and birthdate is not NULL
        and concat(name ,'+',surname,'+',birthdate) in {tuple(combine_list)}"

Could you please give me the advice to find them faster?

Answer 1

You could query the columns as a tuple:

Select concat(name ,'+',surname,'+',birthdate) as new_column, university
from db_table
where (name, surname, birthdate) IN (('XXX', 'YYY', '29-05-53'),
                                     ('AAA', 'BBB', '01-01-1997'), ...)

This should be faster than querying against concatenated values, especially if there is an index over the columns in the WHERE clause.

How do make faster query from pandas to postgresql

Question

1 answers

solution1
0 2021-06-01 13:31:22

How do make faster query from pandas to postgresql

Question

1 answers

solution1 0 2021-06-01 13:31:22

solution1
0 2021-06-01 13:31:22