簡體   English   中英

Python腳本運行時間太長?

[英]Python script taking too long to run?

我正在編寫一個基本上執行以下操作的python腳本

  1. 讀取CSV文件作為數據框對象。
  2. 根據名稱選擇一些列,並將其存儲在新的DF對象中。
  3. 對單元格中的值進行一些數學和字符串操作。 我在這里使用for循環和iterrows()方法。
  4. 將修改后的DF寫入CSV
  5. 使用for循環將CSV寫入json。

此代碼永遠需要運行。 我試圖了解為什么要花這么長時間,以及是否應該以不同的方式執行任務以加快執行速度。

import pandas
import json
import pendulum
import csv
import os
import time

start_time = time.time()
print("--- %s seconds ---" % (time.time() - start_time))

os.chdir('/home/csv_files_from_REC')
df11 = pandas.read_csv('RTP_Gap_2018-01-21.csv') ### Reads the CSV FILE

print df11.shape ### Prints the shape of the DF

### Filter the initial DF by selecting some columns based on NAME
df1 = df11[['ENODEB','DAY','HR','SITE','RTP_Gap_Length_Total_sec','RTP_Session_Duration_Total_sec','RTP_Gap_Duration_Ratio_Avg%']]

print df1.shape ## Prints Shape

#### Math and String manupulation stuff ###
for index, row in df1.iterrows():
    if row['DAY'] == 'Total':
        df1.drop(index, inplace=True)
    else:
        stamp = row['DAY'] + ' ' + str(row['HR']) + ':00:00'
        sitename = str(row['ENODEB'])+'_'+row['SITE']
        if row['RTP_Session_Duration_Total_sec'] == 0:
            rtp_gap = 0
        else:
            rtp_gap = row['RTP_Gap_Length_Total_sec']/row['RTP_Session_Duration_Total_sec']
        time1 = pendulum.parse(stamp,tz='America/Chicago').isoformat()
        df1.loc[index,'DAY'] = time1
        df1.loc[index,'SITE'] = sitename
        df1.loc[index,'HR'] = rtp_gap

### Write DF to CSV ###
df1.to_csv('RTP_json.csv',index=None)
json_file_ind = 'RTP_json.json'
file = open(json_file_ind, 'w')
file.write("")
file.close()

#### Write CSV to JSON ###
with open('RTP_json.csv', 'r') as csvfile:
    reader_ind = csv.DictReader(csvfile)
    row=[]
    for row in reader_ind:         
        row["RTP_Gap_Length_Total_sec"] = float(row["RTP_Gap_Length_Total_sec"])
        row["RTP_Session_Duration_Total_sec"] = float(row["RTP_Session_Duration_Total_sec"])
                row["RTP_Gap_Duration_Ratio_Avg%"]=float(row["RTP_Gap_Duration_Ratio_Avg%"])
        row["HR"] = float(row["HR"])
        with open('RTP_json.json', 'a') as json_file_ind:
            json.dump(row, json_file_ind)
            json_file_ind.write('\n')

 end_time = time.time()
 print("--- %s seconds ---" % (time.time() - end_time))

輸出量

    --- 2018-01-23T12:25:07.411691-06:00 seconds ---### START TIME
    (2055, 36) ### SIZE of initial DF
    (2055, 7) ### Size of Filtered DF
    --- 2018-01-23T12:31:54.480568-06:00 seconds --- --- ### END TIME

這部分應該大大加快您的數據框計算

import numpy as np

df1 = df11[['ENODEB','DAY','HR','SITE','RTP_Gap_Length_Total_sec','RTP_Session_Duration_Total_sec','RTP_Gap_Duration_Ratio_Avg%']]

print df1.shape ## Prints Shape

df1 = df1[df1.DAY != 'Total'].reset_index()
df1['DAY'] = pendulum.parse(df1['DAY'] + ' ' + str(df1['HR']) + ':00:00',tz='America/Chicago').isoformat()
df1['SITE'] = str(df1['ENODEB'])+'_'+df1['SITE']
df1['HR'] = np.where(df1['RTP_Session_Duration_Total_sec']==0,0,df1['RTP_Gap_Length_Total_sec']/df1['RTP_Session_Duration_Total_sec'])

另外,為什么還要麻煩寫一個csv並再次讀取它。

將df轉換為json格式

format_json =  df1.to_json(orient='records') # converts df to json list
json_file_ind = 'RTP_json.json'
file = open(json_file_ind, 'w')
for i in format_json:
    file.write(i)
    file.write('\n')

這應該可以大大加快代碼的速度

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM