繁体   English   中英

numpy的:摆脱循环

[英]numpy: get rid of loop

我正在通过实践学习麻木。 我对此有些麻烦。 我必须编写一个函数,将np_array作为参数并返回一个新的np_array。 参数看起来像:

>> log
array([['2015-05-08T15:46:06+0200', '2015-05-08T17:21:36+0200'],
       ['2015-05-08T17:10:53+0200', '2015-05-09T06:30:08+0200'],
       ['2015-08-09T22:38:45+0200', '2015-08-09T22:38:45+0200'],
       ['2015-08-09T22:41:33+0200', '2015-08-10T08:39:26+0200'],
       ['2015-08-11T17:25:52+0200', '2015-08-12T08:14:30+0200'],
       ['2015-08-13T13:12:08+0200', '2015-08-13T19:42:50+0200'],
       ['2015-08-13T17:30:18+0200', '2015-08-14T10:13:10+0200'],
       ['2015-10-20T13:42:07+0200', '2015-10-20T16:13:37+0200'],
       ['2015-10-21T10:27:05+0200', '2015-10-21T16:13:11+0200'],
       ['2015-12-05T13:28:51+0100', '2015-12-05T22:43:20+0200']], dtype='datetime64[s]')

日志包含有关与服务器连接的信息。 每行的第一个元素是登录日期,第二个元素是相应的注销日期。

新的np_array应该返回在第一次连接之前的星期一与连接之后的星期一之间的每周连接服务器的小时数。

>> func(log)
array([[time_connected_week1,
        time_connected_week2,
        time_connected_week3,

               ...
        time_connected_weekn]], dtype='timedelta64[s]'

week1(weekn)必须适合日志数组的第一(最后)周。

我写了以下代码:

def func(log):
    begin = np.datetime64("2015-05-04")        # first monday
    end = np.datetime64("2015-12-07")      # last monday

    week_td64 = np.timedelta64(1, 'W') 
    nbWeek_td64 = int((end - begin) / week_td64)

    week = begin + np.arange(nbWeek_td64) * week_td64    # arange(week1, weekn)

    weekHours = []       # list to store return values

    for w in week:    
        mask1 = log[:,0] > w
        mask2 = log[:,0] < w  + week_td64
        l = log[mask1 & mask2]     # get log row matching the current week 

        totalweek = (l[:,1] - l[:,0]).sum()    #compute sum of result

        weekHours.append(totalweek)     #save value

    return np.array(weekHours)

关于我的代码,我有两个问题:
1 /我如何自动找到第一个星期一 np.datetime64不支持weekday()。 我必须使用datetime.datetime吗?
2 /如何摆脱循环 有人说过,numpy可以摆脱循环。 我相信我们可以用花哨的切片来做到这一点。

对于有关自动获取第一个星期一的第一个问题,您可以使用busday_offset来定义一个工作日掩码,以仅将星期一视为公共汽车日:

firstDay = np.min(log[:, 0])
firstMonday = first_monday(firstDay)

def first_monday(firstDay):
    firstEntry = firstDay.astype('M8[D]')
    beforeMonday = np.busday_offset(firstEntry, -1, 'forward', [1,0,0,0,0,0,0])
    if firstEntry - beforeMonday == np.timedelta64(7, 'D'):
        return firstEntry
    else:
        return beforeMonday

提示:您可以通过np.tile()日志和np.repeat()摆脱循环。

最后的答案:除非你放弃,否则不要阅读。

首先定义一个GetMonday函数:

def GetMonday(firstDay, forward=False):
    firstEntry = firstDay.astype('M8[D]')
    beforeMonday = np.busday_offset(firstEntry, forward*2-1, 'forward', [1,0,0,0,0,0,0])
    if abs(firstEntry-beforeMonday) == np.timedelta64(7, 'D'):
        return firstEntry.astype('M8[s]')
    else:
        return beforeMonday.astype('M8[s]')

然后,您可以编写代码:

log = np.array([['2015-05-08T15:46:06+0200', '2015-05-08T17:21:36+0200'],
   ['2015-05-08T17:10:53+0200', '2015-05-09T06:30:08+0200'],
   ['2015-08-09T22:38:45+0200', '2015-08-09T22:38:45+0200'],
   ['2015-08-09T22:41:33+0200', '2015-08-10T08:39:26+0200'],
   ['2015-08-11T17:25:52+0200', '2015-08-12T08:14:30+0200'],
   ['2015-08-13T13:12:08+0200', '2015-08-13T19:42:50+0200'],
   ['2015-08-13T17:30:18+0200', '2015-08-14T10:13:10+0200'],
   ['2015-10-20T13:42:07+0200', '2015-10-20T16:13:37+0200'],
   ['2015-10-21T10:27:05+0200', '2015-10-21T16:13:11+0200'],
   ['2015-12-05T13:28:51+0100', '2015-12-05T22:43:20+0200']], dtype='datetime64[s]')

login = log[:,0]
logoff = log[:,1]
begin = GetMonday(np.min(login))
end = GetMonday(np.max(logoff), True)

n_logs = log.shape[0]*1.0
week_td64 = np.timedelta64(1, 'W')
nbWeek_td64 = int((end - begin) / week_td64)

week = begin + np.arange(nbWeek_td64) * week_td64

tiledLogin = np.tile(login, nbWeek_td64)
repeatedWeek = np.repeat(week, n_logs)
repeatedWeek_order = np.repeat(np.arange(nbWeek_td64), n_logs)

loginWeekMask = (tiledLogin >= repeatedWeek) & (tiledLogin < repeatedWeek+np.timedelta64(1,'W'))

hours_spent = (logoff-login).astype('timedelta64[h]')
weeks_entry = repeatedWeek_order[loginWeekMask]

print np.bincount(weeks_entry.astype('int64'), hours_spent.astype('float64'))
#[ 14.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   9.  36.
#   0.   0.   0.   0.   0.   0.   0.   0.   0.   7.   0.   0.   0.   0.   0.
#   8.]

这将为您提供每周工作时间的数组。 这不是正确的最终答案,因为您可能需要注销登录,而注销将持续一个多星期,但我将为您找出解决方法。

对不起,我错过了。 实际上,没有np.tile和np.repeat,有一种更容易的方法来知道日志条目属于哪一周。

您唯一要做的就是从星期一开始计算timedelta64,那么您将拥有它所属的星期:

log = np.array([['2015-05-08T15:46:06+0200', '2015-05-08T17:21:36+0200'],
   ['2015-05-08T17:10:53+0200', '2015-05-09T06:30:08+0200'],
   ['2015-08-09T22:38:45+0200', '2015-08-09T22:38:45+0200'],
   ['2015-08-09T22:41:33+0200', '2015-08-10T08:39:26+0200'],
   ['2015-08-11T17:25:52+0200', '2015-08-12T08:14:30+0200'],
   ['2015-08-13T13:12:08+0200', '2015-08-13T19:42:50+0200'],
   ['2015-08-13T17:30:18+0200', '2015-08-14T10:13:10+0200'],
   ['2015-10-20T13:42:07+0200', '2015-10-20T16:13:37+0200'],
   ['2015-10-21T10:27:05+0200', '2015-10-21T16:13:11+0200'],
   ['2015-12-05T13:28:51+0100', '2015-12-05T22:43:20+0200']], dtype='datetime64[s]')

login = log[:,0]
logoff = log[:,1]
begin = GetMonday(np.min(login))
end = GetMonday(np.max(logoff), True)

n_logs = log.shape[0]*1.0
week_td64 = np.timedelta64(1, 'W')

weeks_entry = np.floor((login-begin)/week_td64)
hours_spent = (logoff-login).astype('timedelta64[h]')

print np.bincount(weeks_entry.astype('int64'), hours_spent.astype('float64'))
#[ 14.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   0.   9.  36.
#   0.   0.   0.   0.   0.   0.   0.   0.   0.   7.   0.   0.   0.   0.   0.
#   8.]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM