简体   繁体   中英

What is the fastest way to select count of entries in big SQL table?

I have a table with 200mil rows.
Lets call the table employee_internet_history .
Row : employee_fullname || website || date || more data employee_fullname || website || date || more data employee_fullname || website || date || more data .
The table have an index on employee_fullname column.

I also have another table eu_employees .
with 100 rows; each Row: employee_fullname || more data employee_fullname || more data .

I want to create a query to select top 3 websites surfed by each employee.

I am using Oracle Database so i thought about using PL/SQL to achieve this. Currently i am using

 declare
   cursor top100workers is 
      select * from eu_employees
      where rownum < 100;

 begin
    for worker in top100workers
      LOOP
         DBMS_OUTPUT.PUT_LINE(worker.employee_fullname ||' top 3 webpages:');
         for TOP3 in (
             SELECT  /*+ parallel*/ website,
            COUNT(website) AS num
            from employee_internet_history
            WHERE employee_internet_history.employee_fullname = worker.employee_fullname
            group by website
            order by num desc
         )
         LOOP
         DBMS_OUTPUT.PUT_LINE('website = ' || TOP3.website || ' ,times surferd: '||top3.num);
         end loop;
      end LOOP;
end;
/

For each employee this query takes around 200 seconds. And my real eu_employee table has over 8000 records.
Which Means it will take 19 days to count this using my way.

1) How can i speed things up?

2) Why dose it take so long?
If all the records of an employee are indexed, it should take O(1) to find them, and count them.

Also the queries are not depended on each other, 3) can i run few queries parallel ?

4) I saw there are several hints to run things at parallel mode, which one will fit best for my needs ?

5) is there a solution without using pl/sql ?

In general, a "set-based" approach (using a query) is going to be faster than using PL/SQL.

The following query does what you want:

select eih.*
from (select employee_id, website, count(*) as cnt,
             row_number() over (partition by employee_id order by count(*) desc) as seqnum
      from employee_internet_history eih
      group by employee_id, website
     ) eih
where seqnum <= 3;

I'm not sure if you can make it run faster, because you do have to start by aggregating the data at the employee/website level. If you want more employee information, then join in eu_employees .

By the way, using employee_fullname for the join key is a really bad idea. People might change their names throughout their lives for various reasons.

Let me also add that your query would probably run much faster with an index on employee_internet_history(employee_fullname, website) . You can also leave out the join to the worker information. At least nothing in the question suggests that it is needed (unless it is used for filtering).

EDIT:

The performance is highly dependent on your hardware and memory. You can speed the query by using a subset of employees by joining them in:

select eih.*
from (select employee_id, website, count(*) as cnt,
             row_number() over (partition by employee_id order by count(*) desc) as seqnum
      from employee_internet_history eih join
           (select ee.*
            from eu_employees ee
            where rownum < 100
           ) ee
           on eih.employee_id = w.employee_id
      group by employee_id, website
     ) eih
where seqnum <= 3;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM