[英]Create a timestamp that allows partial indexing in a dataframe made of lists in pandas
I'm having a problem in Pandas with the TimeStamp selections. 我在使用TimeStamp选择的Pandas中遇到问题。
For what I see that is a problem that some other people have ( Selecting a subset of a Pandas DataFrame indexed by DatetimeIndex with a list of TimeStamps ) but the developers of Pandas unfortunately refuse to accept it as a bug ( https://github.com/pydata/pandas/issues/2437 ). 对于我来说,这是其他人遇到的一个问题( 选择由DatetimeIndex索引并带有TimeStamps列表的Pandas DataFrame的子集 ),但是Pandas的开发人员不幸地拒绝接受它作为bug( https:// github。 com / pydata / pandas / issues / 2437 )。
In any case I couldn't follow the work around proposed for the SO post I quote above, since my data doesn't come in a CSV file, but in a number of lists (actually I got it from the internet trough JSON and convert that to lists). 无论如何,由于我的数据不是以CSV文件的形式存在,而是以许多列表形式存在(实际上我是从互联网低谷JSON那里获取并转换的,列出)。
The data I got is something like this: 我得到的数据是这样的:
the_dataTransactions
[{u'date': u'1365100630', u'tid': 240264, u'price': u'132.58', u'amount': u'1.28309000'}, {u'date': u'1365100630', u'tid': 240263, u'price': u'132.58', u'amount': u'1.20294000'}, {u'date': u'1365100629', u'tid': 240262, u'price': u'132.58', u'amount': u'0.90893940'}]
And I convert it to: 我将其转换为:
transactionsDate
[datetime.datetime(2013, 4, 4, 19, 37, 10), datetime.datetime(2013, 4, 4, 19, 37, 10), datetime.datetime(2013, 4, 4, 19, 37, 9)]
And I also tried this, but the error in the result when I try to select a data range was the same: 我也尝试过此操作,但是尝试选择数据范围时结果中的错误是相同的:
transactionsDate
[<Timestamp: 2013-04-04 19:37:10>, <Timestamp: 2013-04-04 19:37:10>, <Timestamp: 2013-04-04 19:37:09>]
And the tid, price and amount where also added to a data frame like: 潮价,价格和金额也添加到数据框中,例如:
>>> transactionsDF.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 73 entries, 2013-04-04 19:37:10 to 2013-04-04 19:22:49
Data columns:
tid 73 non-null values
price 73 non-null values
amount 73 non-null values
dtypes: float64(2), int64(1)
>>> transactionsDF.head()
tid price amount
2013-04-04 19:37:10 240264 132.58 1.283090
2013-04-04 19:37:10 240264 132.58 1.283090
2013-04-04 19:37:10 240263 132.58 1.202940
2013-04-04 19:37:09 240262 132.58 0.908939
2013-04-04 19:37:09 240261 132.59 0.213051
But, when I try to choose a data range using the normal notation, I get the same error that is reported in the other post: 但是,当我尝试使用常规表示法选择数据范围时,却遇到了另一篇文章中报告的相同错误:
>>> transactionsDF['2013-04-03 18:00:00':'2013-04-04 19:00:00']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/frame.py", line 1951, in __getitem__
indexer = self.ix._convert_to_indexer(key, axis=0)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/indexing.py", line 478, in _convert_to_indexer
i, j = labels.slice_locs(start, stop)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/tseries/index.py", line 1153, in slice_locs
start_loc = self._get_string_slice(start).start
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/tseries/index.py", line 1143, in _get_string_slice
loc = self._partial_date_slice(reso, parsed)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/tseries/index.py", line 1041, in _partial_date_slice
raise TimeSeriesError('Partial indexing only valid for ordered '
pandas.tseries.index.TimeSeriesError: Partial indexing only valid for ordered time series.
My data seems to be in a ordered time series. 我的数据似乎在有序的时间序列中。 Can you think about a work around for this Pandas glitch in this particular case?
您能考虑在这种特殊情况下解决这种熊猫故障的方法吗?
UPDATE (Solved?): I found a way that is so simple that I'm not completely sure it will give the correct answer everytime, but at least for a small data frame it's working. 更新(已解决?):我发现一种简单的方法,无法完全确定每次都能给出正确的答案,但至少对于一个小的数据框架而言,它是可行的。 The code it's just:
只是它的代码:
transactionsDF = transactionsDF.sort_index()
And after this is seems to be working fine and allow me to choose a data range like I used to with other data: transactionsDF['2013-04-04 19:30':'2013-04-04 19:35'] 在这之后似乎工作正常,并允许我选择一个数据范围,就像我以前使用的其他数据一样:transactionsDF ['2013-04-04 19:30':'2013-04-04 19:35']
Perhaps someone more knowledgeable might validate or unvalidate this workaround. 也许知识渊博的人可能会验证或取消此变通办法。
I think there is no truly elegant solution. 我认为没有真正优雅的解决方案。 Pandas does not like duplicate indexes.
熊猫不喜欢重复索引。 (At least the slightly old version that I have.) You can create DataFrames with duplicate indexes but you can't access their content comfortably.
(至少我使用的版本稍旧。)您可以使用重复的索引创建DataFrame,但是无法舒适地访问它们的内容。
Therefore you should put the dates into a separate column. 因此,您应该将日期放在单独的列中。 Then you access the interesting rows with comparison operators on the dates, and fancy indexing:
然后,您可以使用日期上的比较运算符和花式索引来访问有趣的行:
In [1]: import pandas as pd
In [5]: import datetime
In [15]: f1 = pd.DataFrame([{u'date': u'1365100630', u'tid': 240264, u'price': u'132.58', u'amount': u'1.28309000'}, {u'date': u'1365100630', u'tid': 240263, u'price': u'132.58', u'amount': u'1.20294000'}, {u'date': u'1365100629', u'tid': 240262, u'price': u'132.58', u'amount': u'0.90893940'}])
In [16]: f1["dates"] = [datetime.datetime(2013, 4, 4, 19, 37, 10), datetime.datetime(2013, 4, 4, 19, 37, 10), datetime.datetime(2013, 4, 4, 19, 37, 9)]
In [17]: f1
Out[17]:
amount date price tid dates
0 1.28309000 1365100630 132.58 240264 2013-04-04 19:37:10
1 1.20294000 1365100630 132.58 240263 2013-04-04 19:37:10
2 0.90893940 1365100629 132.58 240262 2013-04-04 19:37:09
In [25]: matching = (f1["dates"] >= datetime.datetime(2013, 4, 4, 19, 37, 10)) & (f1["dates"] < datetime.datetime(2013, 4, 4, 20, 00, 00))
In [26]: f1.ix[matching]
Out[26]:
amount date price tid dates
0 1.28309000 1365100630 132.58 240264 2013-04-04 19:37:10
1 1.20294000 1365100630 132.58 240263 2013-04-04 19:37:10
You can also use f1[matching]
to access the interesting rows, but I find it less clear, because f1["foo"]
is used to access columns. 您也可以使用
f1[matching]
来访问有趣的行,但是我发现它不太清楚,因为f1["foo"]
用于访问列。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.