簡體   English   中英

如何更改熊貓的timedelta格式

[英]How to change format of timedelta in pandas

    import pandas as pd
    import datetime
    import numpy as np
    from datetime import timedelta

    def diff_func(row):
            return (row['Timestamp'] - row['previous_end'])


    dfMockLog = [     (1, ("2017-01-01 09:00:00"), "htt://x.org/page1.html"),
                      (1, ("2017-01-01 09:01:00"), "htt://x.org/page2.html"),
                      (1, ("2017-01-01 09:02:00"), "htt://x.org/page3.html"),
                      (1, ("2017-01-01 09:05:00"), "htt://x.org/page3.html"),
                      (1, ("2017-01-01 09:30:00"), "htt://x.org/page2.html"),
                      (1, ("2017-01-01 09:33:00"), "htt://x.org/page1.html"),
                      (1, ("2017-01-01 09:37:00"), "htt://x.org/page2.html"),
                      (1, ("2017-01-01 09:41:00"), "htt://x.org/page3.html"),
                      (1, ("2017-01-01 10:00:00"), "htt://x.org/page1.html"),
                      (1, ("2017-01-01 11:00:00"), "htt://x.org/page2.html"),
                      (2, ("2017-01-01 09:41:00"), "htt://x.org/page3.html"),
                      (2, ("2017-01-01 09:42:00"), "htt://x.org/page1.html"),
                      (2, ("2017-01-01 09:43:00"), "htt://x.org/page2.html")]

    dfMockLog = pd.DataFrame(dfMockLog, columns = ['user', 'Timestamp', 'url'])
    dfMockLog['Timestamp'] = pd.to_datetime(dfMockLog['Timestamp'])
    dfMockLog = dfMockLog.sort_values(['user','Timestamp'])

    dfMockLog['previous_end'] = dfMockLog.groupby(['user'])['Timestamp'].shift(1)

    dfMockLog['time_diff'] = dfMockLog.apply(diff_func, axis=1)

    dfMockLog['cum_sum'] = dfMockLog['time_diff'].cumsum()

    print(dfMockLog)

我需要將“ timediff”列轉換為秒。“ cum_sum”列應包含按“用戶”划分的累積總和。 如果可以共享所有可能的timedelta格式,那就太好了。

你近了 我更喜歡的方法是通過pd.Series.dt.seconds創建一個包含time_diff的新列(以秒為pd.Series.dt.seconds 然后使用groupby.transform提取usercumsum

dfMockLog['time_diff_secs'] = dfMockLog['time_diff'].dt.seconds
dfMockLog['cum_sum'] = dfMockLog.groupby('user')['time_diff_secs'].transform('cumsum')

print(dfMockLog)

    user           Timestamp                     url        previous_end  \
0      1 2017-01-01 09:00:00  htt://x.org/page1.html                 NaT   
1      1 2017-01-01 09:01:00  htt://x.org/page2.html 2017-01-01 09:00:00   
2      1 2017-01-01 09:02:00  htt://x.org/page3.html 2017-01-01 09:01:00   
3      1 2017-01-01 09:05:00  htt://x.org/page3.html 2017-01-01 09:02:00   
4      1 2017-01-01 09:30:00  htt://x.org/page2.html 2017-01-01 09:05:00   
5      1 2017-01-01 09:33:00  htt://x.org/page1.html 2017-01-01 09:30:00   
6      1 2017-01-01 09:37:00  htt://x.org/page2.html 2017-01-01 09:33:00   
7      1 2017-01-01 09:41:00  htt://x.org/page3.html 2017-01-01 09:37:00   
8      1 2017-01-01 10:00:00  htt://x.org/page1.html 2017-01-01 09:41:00   
9      1 2017-01-01 11:00:00  htt://x.org/page2.html 2017-01-01 10:00:00   
10     2 2017-01-01 09:41:00  htt://x.org/page3.html                 NaT   
11     2 2017-01-01 09:42:00  htt://x.org/page1.html 2017-01-01 09:41:00   
12     2 2017-01-01 09:43:00  htt://x.org/page2.html 2017-01-01 09:42:00   

    time_diff  time_diff_secs  cum_sum  
0         NaT             NaN      NaN  
1    00:01:00            60.0     60.0  
2    00:01:00            60.0    120.0  
3    00:03:00           180.0    300.0  
4    00:25:00          1500.0   1800.0  
5    00:03:00           180.0   1980.0  
6    00:04:00           240.0   2220.0  
7    00:04:00           240.0   2460.0  
8    00:19:00          1140.0   3600.0  
9    01:00:00          3600.0   7200.0  
10        NaT             NaN      NaN  
11   00:01:00            60.0     60.0  
12   00:01:00            60.0    120.0  

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM