简体   繁体   English

如何在python中更新pandas数据框特定列的所有行?

[英]How to update all rows in particular column of pandas dataframe in python?

I want to read a csv file and store this file in pandas data-frame, after that I want to check one column value is equal to constant variable and that equal rows should be kept in separate data-frame. 我想读取一个csv文件并将此文件存储在pandas数据框中,此后,我要检查一列值是否等于常量变量,并且应将相等的行保留在单独的数据框中。

Next step is to update one column from the separate data-frame. 下一步是从单独的数据框中更新一列。 In this step I'm iterating through whole data-frame and updating all the rows of particular column, so it will take too much time because my data-frame is has thousands of rows. 在此步骤中,我将遍历整个数据框架并更新特定列的所有行,因此将花费太多时间,因为我的数据框架具有数千行。

Input.csv- Input.csv-

line_no,time
205,1467099122677889
205,1467099122677889
206,1467099363719028
207,1467099363818373
207,1467099363918360
208,1467099363818373
210,1467099363958749

Program- 程序-

import pandas as pd

if __name__ == "__main__":

   file_path = 'Input.csv'
   input_line_no = 205

   pd_dataframe = pd.read_csv(file_path,delimiter=',',keep_default_na=False)
   match_df = pd.DataFrame(pd_dataframe.loc[pd_dataframe['line_no'] == int(input_line_no)])

   if match_df.empty:
       print 'Given line no is not present in dataframe.'
       sys.exit(1)
   match_df = match_df.applymap(str)
   for index in range(0,len(match_df.index)):

        epoch_time = match_df.iloc[index]['time']
        stamp = int(str(epoch_time)+'0')
        date = datetime.datetime.fromtimestamp(stamp / 10000000.0).strftime('%H:%M:%S %f')[:-3]
        match_df['time'].apply(str)
        match_df.iloc[index]['time'] = date

   print match_df.to_csv(index=False)

This time column is in epoch time I want to convert it into the human readable timestamp so logic is for that purpose only. 此时间列是以纪元时间表示的,我想将其转换为人类可读的时间戳,因此逻辑仅用于此目的。

But I'm facing execution time issue regarding to this task. 但是我面临与此任务有关的执行时间问题。 Is there any other way to update the existing data-frame's column in the faster manner? 还有其他方法可以更快地更新现有数据框的列吗?

IIUC you can use first: 您可以先使用IIUC:

match_df = pd_dataframe[pd_dataframe['line_no'] == int(input_line_no)].copy()
print (match_df)
   line_no              time
0      205  1467099122677889
1      205  1467099122677889

You can use apply , because timestamp limitations : 您可以使用apply ,因为时间戳限制

In [55]: pd.Timestamp.max 在[55]中:pd.Timestamp.max
Out[55]: Timestamp('2262-04-11 23:47:16.854775807') 出[55]:时间戳('2262-04-11 23:47:16.854775807')

match_df['time'] = match_df.time
                           .apply(lambda x: datetime.datetime.fromtimestamp(int(str(x)+'0')
                                   / 10000000.0)) 
print (match_df)
   line_no                       time
0      205 2016-06-28 09:32:02.677889
1      205 2016-06-28 09:32:02.677889

And then: 接着:

match_df['time'] = match_df.time
                           .apply(lambda x: datetime.datetime.fromtimestamp(int(str(x)+'0') 
                                   / 10000000.0).strftime('%H:%M:%S %f')[:-3]) 
print (match_df)
   line_no          time
0      205  09:32:02 677
1      205  09:32:02 677

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM