過濾元組的numpy數組

Question

Scikit-learn庫是數據聚類- 股票市場結構的傑出典范。 在美國股票中它運作良好。 但是，當人們添加來自其他市場的numpy ， numpy出現的錯誤似乎是數組應該具有相同的大小-的確如此，例如，德國股票的交易日歷不同。

好的，在報價下載后，我要准備共享日期：

quotes = [quotes_historical_yahoo_ochl(symbol, d1, d2, asobject=True)
          for symbol in symbols]


def intersect(list_1, list_2):
    return list(set(list_1) & set(list_2))

dates_all = quotes[0].date
for q in quotes:
    dates_symbol = q.date
    dates_all = intersect(dates_all, dates_symbol)

然后，我被困在過濾元組的numpy數組。 這里有一些嘗試：

# for index, q in enumerate(quotes):
#     filtered = [i for i in q if i.date in dates_all]

#     quotes[index] = np.rec.array(filtered, dtype=q.dtype)
#     quotes[index] = np.asanyarray(filtered, dtype=q.dtype)
#
#     quotes[index] = np.where(a.date in dates_all for a in q)
#
#     quotes[index] = np.where(q[0].date in dates_all)

如何將過濾器應用於numpy數組或如何真正地將記錄列表（在過濾器之后）轉換回numpy的recarray ？

引號[0] .dtype：

'(numpy.record, [('date', 'O'), ('year', '<i2'), ('month', 'i1'), ('day', 'i1'), ('d', '<f8'), ('open', '<f8'), ('close', '<f8'), ('high', '<f8'), ('low', '<f8'), ('volume', '<f8'), ('aclose', '<f8')])'

引號[0] .shape：

<class 'tuple'>: (261,)

Answer 1

因此， quotes是一個recarray列表，在date_all您將在date字段中收集所有值的交集。

我可以使用以下方法重新創建一個這樣的數組：

In [286]: dt=np.dtype([('date', 'O'), ('year', '<i2'), ('month', 'i1'), ('day', 
     ...:
     ...: ), ('low', '<f8'), ('volume', '<f8'), ('aclose', '<f8')])
In [287]: 
In [287]: arr=np.ones((2,), dtype=dt)  # 2 element structured array
In [288]: arr
Out[288]: 
array([(1, 1, 1, 1,  1.,  1.,  1.,  1.,  1.,  1.,  1.),
       (1, 1, 1, 1,  1.,  1.,  1.,  1.,  1.,  1.,  1.)], 
      dtype=[('date', 'O'), ('year', '<i2'), ('month', 'i1'), ('day', 'i1'), ... ('aclose', '<f8')])
In [289]: type(arr[0])
Out[289]: numpy.void

把它變成一個Recarray（我不像普通結構化數組那樣使用它們）：

In [291]: np.rec.array(arr)
Out[291]: 
rec.array([(1, 1, 1, 1,  1.,  1.,  1.,  1.,  1.,  1.,  1.),
 (1, 1, 1, 1,  1.,  1.,  1.,  1.,  1.,  1.,  1.)], 
          dtype=[('date', 'O'), ('year', '<i2'), ('month', 'i1'), ('day', 'i1'), .... ('aclose', '<f8')])

dtype的recarray顯示稍微不同的：

In [292]: _.dtype
Out[292]: dtype((numpy.record, [('date', 'O'), ('year', '<i2'), ('month', 'i1'), ....('aclose', '<f8')]))
In [293]: __.date
Out[293]: array([1, 1], dtype=object)

無論如何， date字段都是對象的數組，可能是datetime ？

q是這些數組之一； i是元素，而i.date是日期字段。

 [i for i in q if i.date in dates_all]

filtered是recarray元素列表。 np.stack可以更好地將它們重新組裝成一個數組（也可以與recarray一起使用）。

np.stack([i for i in arr if i['date'] in alist])

或者，您可以收集匹配記錄的索引，然后對報價數組進行索引

In [319]: [i for i,v in enumerate(arr) if v['date'] in alist]
Out[319]: [0, 1]
In [320]: arr[_]

或先拉出日期字段：

In [321]: [i for i,v in enumerate(arr['date']) if v in alist]
Out[321]: [0, 1]

in1d可能也可以搜索

In [322]: np.in1d(arr['date'],alist)
Out[322]: array([ True,  True], dtype=bool)
In [323]: np.where(np.in1d(arr['date'],alist))
Out[323]: (array([0, 1], dtype=int32),)

過濾元組的numpy數組

問題描述

1 個解決方案

解決方案1
0 已采納 2017-05-02 06:29:08

過濾元組的numpy數組

問題描述

1 個解決方案

解決方案1 0 已采納 2017-05-02 06:29:08

解決方案1
0 已采納 2017-05-02 06:29:08