![](/img/trans.png)
[英]convert a series of strings to a series of pandas Timestamp objects
[英]Unexpected results of min() and max() methods of Pandas series made of Timestamp objects
我在進行基本數據調整時遇到了這種行為,如下例所示:
In [55]: import pandas as pd
In [56]: import numpy as np
In [57]: rng = pd.date_range('1/1/2000', periods=10, freq='4h')
In [58]: lvls = ['A','A','A','B','B','B','C','C','C','C']
In [59]: df = pd.DataFrame({'TS': rng, 'V' : np.random.randn(len(rng)), 'L' : lvls})
In [60]: df
Out[60]:
L TS V
0 A 2000-01-01 00:00:00 -1.152371
1 A 2000-01-01 04:00:00 -2.035737
2 A 2000-01-01 08:00:00 -0.493008
3 B 2000-01-01 12:00:00 -0.279055
4 B 2000-01-01 16:00:00 -0.132386
5 B 2000-01-01 20:00:00 0.584091
6 C 2000-01-02 00:00:00 -0.297270
7 C 2000-01-02 04:00:00 -0.949525
8 C 2000-01-02 08:00:00 0.517305
9 C 2000-01-02 12:00:00 -1.142195
問題:
In [61]: df['TS'].min()
Out[61]: 31969-04-01 00:00:00
In [62]: df['TS'].max()
Out[62]: 31973-05-10 00:00:00
雖然看起來不錯:
In [63]: df['V'].max()
Out[63]: 0.58409076701429163
In [64]: min(df['TS'])
Out[64]: <Timestamp: 2000-01-01 00:00:00>
在groupby之后聚合時:
In [65]: df.groupby('L').min()
Out[65]:
TS V
L
A 9.466848e+17 -2.035737
B 9.467280e+17 -0.279055
C 9.467712e+17 -1.142195
In [81]: val = df.groupby('L').agg('min')['TS']['A']
In [82]: type(val)
Out[82]: numpy.float64
顯然在這種特殊情況下,它與使用頻率日期時間索引作為pd.Series函數的參數有關:
In [76]: rng.min()
Out[76]: <Timestamp: 2000-01-01 00:00:00>
In [77]: ts = pd.Series(rng)
In [78]: ts.min()
Out[78]: 31969-04-01 00:00:00
In [79]: type(ts.min())
Out[79]: numpy.datetime64
但是,我最初的問題是通過pd.read_csv()從字符串解析的時間戳系列的最小值/最大值
我究竟做錯了什么?
正如@meteore指出的那樣,NumPy 1.6.x中np.datetime64類型的字符串repr存在問題。 基礎數據應該仍然是正確的。 要解決此問題,您可以執行以下操作:
In [15]: df
Out[15]:
L TS V
0 A 2000-01-01 00:00:00 0.752035
1 A 2000-01-01 04:00:00 -1.047444
2 A 2000-01-01 08:00:00 1.177557
3 B 2000-01-01 12:00:00 0.394590
4 B 2000-01-01 16:00:00 1.835067
5 B 2000-01-01 20:00:00 -0.768274
6 C 2000-01-02 00:00:00 -0.564037
7 C 2000-01-02 04:00:00 -2.644367
8 C 2000-01-02 08:00:00 -0.571187
9 C 2000-01-02 12:00:00 1.618557
In [16]: df.TS.astype(object).min()
Out[16]: datetime.datetime(2000, 1, 1, 0, 0)
In [17]: df.TS.astype(object).max()
Out[17]: datetime.datetime(2000, 1, 2, 12, 0)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.