简体   繁体   中英

HDFStore Exception: cannot find the correct atom type : a basic case

I am facing the same problem as the one raised in How to trouble-shoot HDFStore Exception: cannot find the correct atom type .

I reduce it to an example given in the pandas' documentation Storing Mixed Types in a Table .

The whole point in this example is to append a DataFrame with some missing values to a HDFStore . When I use the example code I end up with an atom type error .

df_mixed
Out[103]: 
          A         B         C  bool           datetime64  int  string
0 -0.065617 -0.062644 -0.004758  True  2001-01-02 00:00:00    1  string
1  1.444643  1.664311 -0.189095  True  2001-01-02 00:00:00    1  string
2  0.569412 -0.077504 -0.125590  True  2001-01-02 00:00:00    1  string
3       NaN       NaN  0.563939  True                  NaN    1     NaN
4       NaN       NaN -0.618218  True                  NaN    1     NaN
5       NaN       NaN  1.477307  True                  NaN    1     NaN
6 -0.287331  0.984108 -0.514628  True  2001-01-02 00:00:00    1  string
7 -0.244192  0.239775  0.861359  True  2001-01-02 00:00:00    1  string

store=HDFStore('df.h5')
store.append('df_mixed', df_mixed, min_itemsize={'values':50})

...
Exception: cannot find the correct atom type -> [dtype->object,items->Index([datetime64, string], dtype=object)] object of type 'Timestamp' has no len()

If I enforce dtype for the problematic types (actually the object ones) as suggested in the linked post (Jeff's answer), I still get the same error. What am I missing here?

dtypes = [('datetime64', '|S20'), ('string', '|S20')]

store=HDFStore('df.h5')
store.append('df_mixed', df_mixed, dtype=dtypes, min_itemsize={'values':50})

...
Exception: cannot find the correct atom type -> [dtype->object,items->Index([datetime64, string], dtype=object)] object of type 'Timestamp' has no len()

Thanks for insights

SOLVED

I was using pandas 0.10 and switched to 0.11-dev . As Jeff inferred, the trouble was with NaN vs NaT .

The former pandas version produced

df_mixed.ix[3:5,['A', 'B', 'string', 'datetime64']] = np.nan such that

2  0.569412 -0.077504 -0.125590  True  2001-01-02 00:00:00    1  string
3       NaN       NaN  0.563939  True                  NaN    1     NaN

while the latter version

2  0.569412 -0.077504 -0.125590  True  2001-01-02 00:00:00    1  string
3       NaN       NaN  0.563939  True                  NaT    1     NaN

The problem are the NaN in your datetime64[ns] series. These MUST be NaT. How did you construct this frame? What pandas version are you using?

Can you use 0.11-dev? (there are several more options here). Try this:

df['datetime64'] = Series(df['datetime64'],dtype='M8[n2]')

In addition, here are some more useful links: http://pandas.pydata.org/pandas-docs/dev/cookbook.html#hdfstore

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM