简体   繁体   中英

How to update all rows in particular column of pandas dataframe in python?

I want to read a csv file and store this file in pandas data-frame, after that I want to check one column value is equal to constant variable and that equal rows should be kept in separate data-frame.

Next step is to update one column from the separate data-frame. In this step I'm iterating through whole data-frame and updating all the rows of particular column, so it will take too much time because my data-frame is has thousands of rows.

Input.csv-

line_no,time
205,1467099122677889
205,1467099122677889
206,1467099363719028
207,1467099363818373
207,1467099363918360
208,1467099363818373
210,1467099363958749

Program-

import pandas as pd

if __name__ == "__main__":

   file_path = 'Input.csv'
   input_line_no = 205

   pd_dataframe = pd.read_csv(file_path,delimiter=',',keep_default_na=False)
   match_df = pd.DataFrame(pd_dataframe.loc[pd_dataframe['line_no'] == int(input_line_no)])

   if match_df.empty:
       print 'Given line no is not present in dataframe.'
       sys.exit(1)
   match_df = match_df.applymap(str)
   for index in range(0,len(match_df.index)):

        epoch_time = match_df.iloc[index]['time']
        stamp = int(str(epoch_time)+'0')
        date = datetime.datetime.fromtimestamp(stamp / 10000000.0).strftime('%H:%M:%S %f')[:-3]
        match_df['time'].apply(str)
        match_df.iloc[index]['time'] = date

   print match_df.to_csv(index=False)

This time column is in epoch time I want to convert it into the human readable timestamp so logic is for that purpose only.

But I'm facing execution time issue regarding to this task. Is there any other way to update the existing data-frame's column in the faster manner?

IIUC you can use first:

match_df = pd_dataframe[pd_dataframe['line_no'] == int(input_line_no)].copy()
print (match_df)
   line_no              time
0      205  1467099122677889
1      205  1467099122677889

You can use apply , because timestamp limitations :

In [55]: pd.Timestamp.max
Out[55]: Timestamp('2262-04-11 23:47:16.854775807')

match_df['time'] = match_df.time
                           .apply(lambda x: datetime.datetime.fromtimestamp(int(str(x)+'0')
                                   / 10000000.0)) 
print (match_df)
   line_no                       time
0      205 2016-06-28 09:32:02.677889
1      205 2016-06-28 09:32:02.677889

And then:

match_df['time'] = match_df.time
                           .apply(lambda x: datetime.datetime.fromtimestamp(int(str(x)+'0') 
                                   / 10000000.0).strftime('%H:%M:%S %f')[:-3]) 
print (match_df)
   line_no          time
0      205  09:32:02 677
1      205  09:32:02 677

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM