简体   繁体   中英

Need help to optimze code , cant decide to use dataframe for sorting or mysql

I am writing code to get data from 1m database (size can increase). I have mysql server locally and writing everything in python. not good at it and trying to optimize everything.

First question is if I can write better sql query and second question is, if I should try to do everything using mysql or it will be good if I use dataframe for example for sorting and filtering data

def listJE(company_id, page_num, per_page):
    columns = 'tr_id, ' + 'tr_date, ' + 'description, '  + 'dr_acc, ' + 
              'cr_acc, ' + 'amount, ' + 'currency, '  + 'document, ' + 'comment'

    sn = (page_num - 1) * per_page
    en = per_page
    ncon = myDB()
    query = """SELECT {} 
               FROM transactions 
               WHERE company_id = {} and deleted = 0 
               ORDER BY tr_id 
               DESC LIMIT {}, {}""".format(
            columns, company_id, sn, en)

    df = ncon.getDF(query) 

return df

For your case, I would suggest using MySQL to do the sorting and return the records you need. Pandas is an amazing tool and can do a lot, but it might not be the best for you in this case.

Since you seem to be limiting the number of rows from a 1 million+ record table, it's likely more efficient to have MySQL sort through it and give you the records you need rather than package up the entire table, transfer it to your application, and then leave it to you to figure out the best way to sort through it and slice the appropriate records.

If you are running the query many times (as your pagination seems to imply), MySQL can cache the query result see this question . So on the next iteration, it might just go "oh, I have this alredy!" and send you the result rather than recomputing it.

Optimization is very nice to have, but consider the cost in time and readability. If you can save some time and make things more readable for the future, like hard coding your column names in the query rather than concatenating them, the go ahead and do it. If you're worried about shaving a few miliseconds off between MySQL or processing in python, you should consider the value gained.

If you are creating a site that has low traffic, then a 5 sec query might be annoying, but it might not be critical. But as was sugested in the comments, running it locally on a workstation might not be a good indication of when you ultimately push it to the server.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM