简体   繁体   中英

pandas df.apply TypeError data type not understood

I'm trying to apply an operation to every value in a datetime series. I've reduced this to a lambda print to illustrate the problem. This works in another similar dataframe but not on this one? Python is version 3.5.1, pandas version 0.17.1.

Some more padding to satisfy the SO question verbosity requirement.

print(dfY.info())
print(dfY)
dfY.apply(lambda rr: print(rr['predicted_time']), 1)

output

<class 'pandas.core.frame.DataFrame'>
Int64Index: 21 entries, 0 to 20
Data columns (total 1 columns):
predicted_time    21 non-null datetime64[ns, pytz.FixedOffset(60)]
dtypes: datetime64[ns, pytz.FixedOffset(60)](1)
memory usage: 336.0 bytes
None
              predicted_time
0  2005-02-01 02:40:00+01:00
1  2005-02-01 02:40:00+01:00
2  2005-02-01 02:40:00+01:00
3  2005-02-01 02:40:00+01:00
4  2005-02-01 02:43:00+01:00
5  2005-02-01 02:43:00+01:00
6  2005-02-01 02:43:00+01:00
<snip>
19 2005-02-01 02:50:00+01:00
20 2005-02-01 02:50:00+01:00

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-43-8ae0cf570812> in <module>()
      1 print(dfY.info())
      2 print(dfY)
----> 3 dfY.apply(lambda rr: print(rr['predicted_time']), 1)

/.../Projects/Software/TimeTillComplete/venv/lib/python3.5/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
   3970                     if reduce is None:
   3971                         reduce = True
-> 3972                     return self._apply_standard(f, axis, reduce=reduce)
   3973             else:
   3974                 return self._apply_broadcast(f, axis)

/.../Projects/Software/TimeTillComplete/venv/lib/python3.5/site-packages/pandas/core/frame.py in _apply_standard(self, func, axis, ignore_failures, reduce)
   4017             # Create a dummy Series from an empty array
   4018             index = self._get_axis(axis)
-> 4019             empty_arr = np.empty(len(index), dtype=values.dtype)
   4020             dummy = Series(empty_arr, index=self._get_axis(axis),
   4021                            dtype=values.dtype)

TypeError: data type not understood

I don't really known what's going on, but as a workaround you can get the expected output calling apply() on the column:

dfY['predicted_time'].apply(lambda rr: print(rr))

EDIT Looks like you hit a bug in pandas. The issue is triggered by using time zone aware timestamps in a dataframe. Using a series works as seen above. Using naive timestamps also works:

df = pd.DataFrame(pd.Series(dfY['predicted_time'].values),
                  columns=['predicted_time'])
df.apply(lambda rr: print(rr['predicted_time']), 1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM