any of you knows how to make this selection faster or more efficient? The thing is that this selection takes more than hours to procesate on SQLite. I am using it with sqlite3
on Python so there is a few limitations of commands.
SELECT C.id, COUNT (L.linea_construccion)
FROM Linea L, Predio P, Comunas C
WHERE L.calidad_construccion = 1 AND C.id = P.comuna
AND L.comuna = C.id AND P.avaluo_exento > C.avaluo_promedio
GROUP BY C.id
There are 3 tables on a database, the table Linea
has 9MM rows, the table Predio
has 7MM and the table Comunas
has 250 aprox.
The format of the tables is:
Predio
. ( id INT
, comuna INT
, avaluo_exento INT
)
Linea
. ( id INT
, comuna INT
, calidad_construccion INT
, linea_construccion INT
)
Comuna
. ( id INT
, avaluo_promedio INT
)
You should always use explicit join
for the best practice, instead using implicit join
on where
condition.
from the given schema of your tables, you can try the following
SELECT
C.id,
COUNT (L.linea_construccion)
FROM Linea L
join Predio P
on L.comuna = P.comuna
join Comunas C
on L.comuna = C.comuna
where L.calidad_construccion = 1
AND P.avaluo_exento > C.avaluo_promedio
GROUP BY
C.id
First, rewrite the query using proper, explicit, standard , readable JOIN
syntax:
SELECT C.id, COUNT(*)
FROM Linea L JOIN
Comunas C
ON L.comuna = C.id JOIN
Predio P
ON C.id = P.comuna AND P.avaluo_exento > C.avaluo_promedio
WHERE L.calidad_construccion = 1
GROUP BY C.id ;
Start with the following indexes:
Linea(calidad_construccion, comuna)
Comunas(id, avaluo_promedio)
-- probably not necessary if "id" is the primary key Predio(comuna, avaluo_exento)
Depending on how many "communas" you have and how many are returned, you might be able to further optimize this query by eliminating the outer GROUP BY
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.