简体   繁体   English

熊猫df.apply TypeError数据类型不了解

[英]pandas df.apply TypeError data type not understood

I'm trying to apply an operation to every value in a datetime series. 我正在尝试对日期时间序列中的每个值应用运算。 I've reduced this to a lambda print to illustrate the problem. 我将其简化为lambda打印以说明问题。 This works in another similar dataframe but not on this one? 这适用于另一个类似的数据框,但不适用于该数据框吗? Python is version 3.5.1, pandas version 0.17.1. Python是3.5.1版,pandas是0.17.1版。

Some more padding to satisfy the SO question verbosity requirement. 还有一些填充可以满足SO问题的详细程度要求。

print(dfY.info())
print(dfY)
dfY.apply(lambda rr: print(rr['predicted_time']), 1)

output 产量

<class 'pandas.core.frame.DataFrame'>
Int64Index: 21 entries, 0 to 20
Data columns (total 1 columns):
predicted_time    21 non-null datetime64[ns, pytz.FixedOffset(60)]
dtypes: datetime64[ns, pytz.FixedOffset(60)](1)
memory usage: 336.0 bytes
None
              predicted_time
0  2005-02-01 02:40:00+01:00
1  2005-02-01 02:40:00+01:00
2  2005-02-01 02:40:00+01:00
3  2005-02-01 02:40:00+01:00
4  2005-02-01 02:43:00+01:00
5  2005-02-01 02:43:00+01:00
6  2005-02-01 02:43:00+01:00
<snip>
19 2005-02-01 02:50:00+01:00
20 2005-02-01 02:50:00+01:00

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-43-8ae0cf570812> in <module>()
      1 print(dfY.info())
      2 print(dfY)
----> 3 dfY.apply(lambda rr: print(rr['predicted_time']), 1)

/.../Projects/Software/TimeTillComplete/venv/lib/python3.5/site-packages/pandas/core/frame.py in apply(self, func, axis, broadcast, raw, reduce, args, **kwds)
   3970                     if reduce is None:
   3971                         reduce = True
-> 3972                     return self._apply_standard(f, axis, reduce=reduce)
   3973             else:
   3974                 return self._apply_broadcast(f, axis)

/.../Projects/Software/TimeTillComplete/venv/lib/python3.5/site-packages/pandas/core/frame.py in _apply_standard(self, func, axis, ignore_failures, reduce)
   4017             # Create a dummy Series from an empty array
   4018             index = self._get_axis(axis)
-> 4019             empty_arr = np.empty(len(index), dtype=values.dtype)
   4020             dummy = Series(empty_arr, index=self._get_axis(axis),
   4021                            dtype=values.dtype)

TypeError: data type not understood

I don't really known what's going on, but as a workaround you can get the expected output calling apply() on the column: 我真的不知道发生了什么,但是作为一种解决方法,您可以在列上调用apply()获得预期的输出:

dfY['predicted_time'].apply(lambda rr: print(rr))

EDIT Looks like you hit a bug in pandas. 编辑好像您遇到了一个熊猫中的错误。 The issue is triggered by using time zone aware timestamps in a dataframe. 通过在数据帧中使用时区感知时间戳来触发此问题。 Using a series works as seen above. 如上所示,使用系列作品。 Using naive timestamps also works: 使用朴素的时间戳也可以:

df = pd.DataFrame(pd.Series(dfY['predicted_time'].values),
                  columns=['predicted_time'])
df.apply(lambda rr: print(rr['predicted_time']), 1)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM