简体   繁体   中英

Would it be possible to optimize this python code, so it is executed faster?

I have this python code that executes ~110k rows/second. I am wondering if it is possible to make it faster?

I am querying data from SQL, and need to format it to json

SQLquery= "SELECT value2 FROM mytable";
cursor.execute(SQLquery)

try:
    ReturnedQuery = cursor.fetchall()    
except Exception as ex:
    pass

if(cursor.description):
        #print(ReturnedQuery)
        colTypes = cursor.description
        column_names = [column[0] for column in colTypes]
        NrOfColumns = len(column_names)
        NrOfRows = len(ReturnedQuery)
        print(NrOfRows)
        Time1 = datetime.datetime.now()
        data = []
        for row in ReturnedQuery:
            i = 0
            dataRow = collections.OrderedDict()
            for field in row:
                dataRow[column_names[i]] = field
                i = i + 1
            data.append(dataRow)
        Time2 = datetime.datetime.now()
        TimeDiff =Time2 -Time1
        print(TimeDiff)

connection.commit()
cursor.close()

Querying one column from SQL returns this: [(0.2,), (0.3,)]

I need to format it to look like this:

[OrderedDict([('value2', 0.2)]), OrderedDict([('value2', 0.3)])]

EDIT: I filtered the query to get what I wanted insted. I am using TimeScaleDB, so I used the following query.

SELECT time_bucket('30 minutes', datetime) AS thirty_min,
AVG(value3) AS value3
FROM mytable
WHERE datetime > '2019-1-1 12:0:0.00' AND datetime < '2019-1-12 12:0:0.00'
GROUP BY thirty_min
ORDER BY thirty_min;

You can use list comprehension and cose the connection as fast as possible to save a few cycles. So this will probably be more efficient:

SQLquery= "SELECT value2 FROM mytable"
cursor.execute(SQLquery)

try:
    result = cursor.fetchall()    
except Exception as ex:
    pass

if cursor.description:
    column_names = [column[0] for column in cursor.description]
else:
    column_names = []
cursor.close()

if column_names:
    data = [OrderedDict(zip(column_names, row)) for row in result]

But perhaps you should take a look if you really need all these rows in the first place. Usually by filtering data before processing it, you can safe cycles in a more structural way.

Assuming you are bottlenecked by CPU (because python uses single Process), I suggest you to try multiprocessing module to split the CPU Load. U can copy fetched columns to list and split it based on number of cores and create seperate Processes to process the splitted data to take advantage of muliple cores. One problem might occur in writing results to same shared variable between processes. I have used Queue from multiprocessing module to overcome this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM