There are list1 and list2, each contains 1,104,824 values
table1 has 350,000,000 rows with 3 columns: ID, name1, name2
and this is what I tried to do:
con = mdb.connect('localhost','user','password','db')
cur = con.cursor()
for i in range(1104824)
sql ="select count(distinct(a.ID)) from (select name1 ,ID from table1 where name2 <> '"+str(list1[i])+"') as a where a.name1 = '"+str(list2[i])+"'"
cur.execute(sql)
data = cur.fetchone()[0]
but it is very very slow. Is there any faster way to do this query?
This is your query:
select count(distinct a.ID)
from (select name1, ID
from table1
where name2 <> '"+str(list1[i])+"'
) a
where a.name1 = '"+str(list2[i])+"'";
I would recommend writing this as:
select count(distinct ID)
from table1
where name2 <> '"+str(list1[i])+"' and
name1 = '"+str(list2[i])+"'";
Then you can speed up the query with an index on table1(name1, name2, id)
-- all three columns in that order.
Note: I would write the sql as:
sql = """
select count(distinct ID)
from table1
where name2 <> '{0}' and name1 = '{1}'
""".format(str(list1[i]), str(list2[i]))
Seems like this would work as well with the appropriate indices:
select count(distinct id)
from table1
where name2 <> 'Name1'
and name1 = 'Name2'
Look into using parameterized queries though. Your query is vulnerable to sql injection and would break for names with apostrophes for example... Lots of examples out there, here's a couple: Python MySQL Parameterized Queries and https://stackoverflow.com/a/1633589/1073631
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.