I want to read a csv file and store this file in pandas data-frame, after that I want to check one column value is equal to constant variable and that equal rows should be kept in separate data-frame.
Next step is to update one column from the separate data-frame. In this step I'm iterating through whole data-frame and updating all the rows of particular column, so it will take too much time because my data-frame is has thousands of rows.
Input.csv-
line_no,time
205,1467099122677889
205,1467099122677889
206,1467099363719028
207,1467099363818373
207,1467099363918360
208,1467099363818373
210,1467099363958749
Program-
import pandas as pd
if __name__ == "__main__":
file_path = 'Input.csv'
input_line_no = 205
pd_dataframe = pd.read_csv(file_path,delimiter=',',keep_default_na=False)
match_df = pd.DataFrame(pd_dataframe.loc[pd_dataframe['line_no'] == int(input_line_no)])
if match_df.empty:
print 'Given line no is not present in dataframe.'
sys.exit(1)
match_df = match_df.applymap(str)
for index in range(0,len(match_df.index)):
epoch_time = match_df.iloc[index]['time']
stamp = int(str(epoch_time)+'0')
date = datetime.datetime.fromtimestamp(stamp / 10000000.0).strftime('%H:%M:%S %f')[:-3]
match_df['time'].apply(str)
match_df.iloc[index]['time'] = date
print match_df.to_csv(index=False)
This time column is in epoch time I want to convert it into the human readable timestamp so logic is for that purpose only.
But I'm facing execution time issue regarding to this task. Is there any other way to update the existing data-frame's column in the faster manner?
IIUC you can use first:
match_df = pd_dataframe[pd_dataframe['line_no'] == int(input_line_no)].copy()
print (match_df)
line_no time
0 205 1467099122677889
1 205 1467099122677889
You can use apply
, because timestamp limitations :
In [55]: pd.Timestamp.max
Out[55]: Timestamp('2262-04-11 23:47:16.854775807')
match_df['time'] = match_df.time
.apply(lambda x: datetime.datetime.fromtimestamp(int(str(x)+'0')
/ 10000000.0))
print (match_df)
line_no time
0 205 2016-06-28 09:32:02.677889
1 205 2016-06-28 09:32:02.677889
And then:
match_df['time'] = match_df.time
.apply(lambda x: datetime.datetime.fromtimestamp(int(str(x)+'0')
/ 10000000.0).strftime('%H:%M:%S %f')[:-3])
print (match_df)
line_no time
0 205 09:32:02 677
1 205 09:32:02 677
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.