I'm creating a data frame in Pandas—
df_data = dict()
for x in data:
series = pandas.Series(x['value']['values'], index=x['value']['timestamps'])
df_data[x['_id']] = series
df = pandas.DataFrame(df_data)
data
is a list of dicts in the format—
{u'_id': u'770000000049',
u'value': {u'timestamps': [datetime.datetime(2012, 7, 25, 10, 16, 1, 270000),
datetime.datetime(2012, 7, 25, 10, 18, 29, 745000),
datetime.datetime(2012, 7, 25, 10, 21, 54, 931000),
datetime.datetime(2012, 7, 25, 10, 23, 18, 896000)],
u'values': [204.0, 16.788, 139.2, 116.004]}}
Printing an example series gives me—
>>> print df_data['770000000049']
>>> 2012-07-25 10:16:01.270000 204.000
2012-07-25 10:18:29.745000 16.788
2012-07-25 10:21:54.931000 139.200
2012-07-25 10:23:18.896000 116.004
As expected. However, printing the resulting data frame gives me—
>>> print df['770000000049']
>>> 1992-06-05 15:50:11.527680 NaN
2181-10-17 22:55:34.850625 NaN
2215-08-27 21:41:15.306049 NaN
1936-05-22 00:55:45.848401 NaN
1783-06-08 06:38:26.257076 NaN
2017-03-12 18:30:17.469108 NaN
2209-08-06 03:45:09.779652 NaN
1768-02-06 12:00:22.653272 NaN
1916-07-20 06:51:31.628376 NaN
2086-01-25 18:30:58.261336 NaN
1940-08-26 15:13:33.790568 NaN
1712-12-17 22:48:01.743241 NaN
1803-06-16 16:32:58.309017 NaN
1981-11-05 04:38:27.140059 NaN
2246-05-25 09:09:27.875035 NaN
...
WTF! The data is all wrong. Both keys and values are completely wrong.
What am I doing wrong?
Edit : Printing df
gives me—
DatetimeIndex: 386 entries, 1992-06-05 15:50:11.527680 to 1774-08-13 02:00:15.237103
Data columns:
770000000006 0 non-null values
770000000009 0 non-null values
770000000010 0 non-null values
770000000011 0 non-null values
770000000012 0 non-null values
770000000013 0 non-null values
770000000018 0 non-null values
770000000020 0 non-null values
770000000021 0 non-null values
770000000022 0 non-null values
770000000024 0 non-null values
770000000029 0 non-null values
770000000030 0 non-null values
770000000032 0 non-null values
770000000034 0 non-null values
770000000049 0 non-null values
dtypes: float64(16)
Completely wrong
Edit 2 :
I've written a module that reproduces the bug for me.
EDIT: It is a bug. I (Wes) fixed it here: https://github.com/pydata/pandas/commit/aea7c4522bd7beffd0df80efee818873110609fa
It turns out it's not a bug —
While pandas does not force you to have a sorted date index, some of these methods may have unexpected or incorrect behavior if the dates are unsorted. So please be careful.
Sorting the dates at the DB level fixed the issue for me.
I ran the snippet you pasted and it seems fine to me. What version of pandas/numpy are you using? Can you post all/more of the data?
In [26]: paste
{u'_id': u'770000000049',
u'value': {u'timestamps': [datetime.datetime(2012, 7, 25, 10, 16, 1, 270000),
datetime.datetime(2012, 7, 25, 10, 18, 29, 745000),
datetime.datetime(2012, 7, 25, 10, 21, 54, 931000),
datetime.datetime(2012, 7, 25, 10, 23, 18, 896000)],
u'values': [204.0, 16.788, 139.2, 116.004]}}
## -- End pasted text --
Out[26]:
{u'_id': u'770000000049',
u'value': {u'timestamps': [datetime.datetime(2012, 7, 25, 10, 16, 1, 270000),
datetime.datetime(2012, 7, 25, 10, 18, 29, 745000),
datetime.datetime(2012, 7, 25, 10, 21, 54, 931000),
datetime.datetime(2012, 7, 25, 10, 23, 18, 896000)],
u'values': [204.0, 16.788, 139.2, 116.004]}}
In [27]: data = [_]
In [28]: paste
df_data = dict()
for x in data:
series = pandas.Series(x['value']['values'], index=x['value']['timestamps'])
df_data[x['_id']] = series
df = pandas.DataFrame(df_data)
## -- End pasted text --
In [29]: print df['770000000049']
2012-07-25 10:16:01.270000 204.000
2012-07-25 10:18:29.745000 16.788
2012-07-25 10:21:54.931000 139.200
2012-07-25 10:23:18.896000 116.004
Name: 770000000049
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.