简体   繁体   中英

Pandas messing up dataframe

I'm creating a data frame in Pandas—

df_data = dict()

for x in data:
    series = pandas.Series(x['value']['values'], index=x['value']['timestamps'])

    df_data[x['_id']] = series

df = pandas.DataFrame(df_data)

data is a list of dicts in the format—

{u'_id': u'770000000049',
 u'value': {u'timestamps': [datetime.datetime(2012, 7, 25, 10, 16, 1, 270000),
                            datetime.datetime(2012, 7, 25, 10, 18, 29, 745000),
                            datetime.datetime(2012, 7, 25, 10, 21, 54, 931000),
                            datetime.datetime(2012, 7, 25, 10, 23, 18, 896000)],
            u'values': [204.0, 16.788, 139.2, 116.004]}}

Printing an example series gives me—

>>> print df_data['770000000049']

>>> 2012-07-25 10:16:01.270000    204.000
2012-07-25 10:18:29.745000     16.788
2012-07-25 10:21:54.931000    139.200
2012-07-25 10:23:18.896000    116.004

As expected. However, printing the resulting data frame gives me—

>>> print df['770000000049']

>>> 1992-06-05 15:50:11.527680   NaN
2181-10-17 22:55:34.850625   NaN
2215-08-27 21:41:15.306049   NaN
1936-05-22 00:55:45.848401   NaN
1783-06-08 06:38:26.257076   NaN
2017-03-12 18:30:17.469108   NaN
2209-08-06 03:45:09.779652   NaN
1768-02-06 12:00:22.653272   NaN
1916-07-20 06:51:31.628376   NaN
2086-01-25 18:30:58.261336   NaN
1940-08-26 15:13:33.790568   NaN
1712-12-17 22:48:01.743241   NaN
1803-06-16 16:32:58.309017   NaN
1981-11-05 04:38:27.140059   NaN
2246-05-25 09:09:27.875035   NaN
...

WTF! The data is all wrong. Both keys and values are completely wrong.

What am I doing wrong?

Edit : Printing df gives me—

DatetimeIndex: 386 entries, 1992-06-05 15:50:11.527680 to 1774-08-13 02:00:15.237103
Data columns:
770000000006    0  non-null values
770000000009    0  non-null values
770000000010    0  non-null values
770000000011    0  non-null values
770000000012    0  non-null values
770000000013    0  non-null values
770000000018    0  non-null values
770000000020    0  non-null values
770000000021    0  non-null values
770000000022    0  non-null values
770000000024    0  non-null values
770000000029    0  non-null values
770000000030    0  non-null values
770000000032    0  non-null values
770000000034    0  non-null values
770000000049    0  non-null values
dtypes: float64(16)

Completely wrong

Edit 2 :

I've written a module that reproduces the bug for me.

EDIT: It is a bug. I (Wes) fixed it here: https://github.com/pydata/pandas/commit/aea7c4522bd7beffd0df80efee818873110609fa


It turns out it's not a bug

While pandas does not force you to have a sorted date index, some of these methods may have unexpected or incorrect behavior if the dates are unsorted. So please be careful.

Sorting the dates at the DB level fixed the issue for me.

I ran the snippet you pasted and it seems fine to me. What version of pandas/numpy are you using? Can you post all/more of the data?

In [26]: paste
{u'_id': u'770000000049',
 u'value': {u'timestamps': [datetime.datetime(2012, 7, 25, 10, 16, 1, 270000),
                            datetime.datetime(2012, 7, 25, 10, 18, 29, 745000),
                            datetime.datetime(2012, 7, 25, 10, 21, 54, 931000),
                            datetime.datetime(2012, 7, 25, 10, 23, 18, 896000)],
            u'values': [204.0, 16.788, 139.2, 116.004]}}
## -- End pasted text --
Out[26]: 
{u'_id': u'770000000049',
 u'value': {u'timestamps': [datetime.datetime(2012, 7, 25, 10, 16, 1, 270000),
   datetime.datetime(2012, 7, 25, 10, 18, 29, 745000),
   datetime.datetime(2012, 7, 25, 10, 21, 54, 931000),
   datetime.datetime(2012, 7, 25, 10, 23, 18, 896000)],
  u'values': [204.0, 16.788, 139.2, 116.004]}}

In [27]: data = [_]

In [28]: paste
df_data = dict()

for x in data:
    series = pandas.Series(x['value']['values'], index=x['value']['timestamps'])

    df_data[x['_id']] = series

df = pandas.DataFrame(df_data)
## -- End pasted text --

In [29]: print df['770000000049']
2012-07-25 10:16:01.270000    204.000
2012-07-25 10:18:29.745000     16.788
2012-07-25 10:21:54.931000    139.200
2012-07-25 10:23:18.896000    116.004
Name: 770000000049

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM