使用分类列附加到Pandas DataFrame

Question

How do I append to a Pandas DataFrame containing predefined columns of categorical datatype: 如何附加到包含分类数据类型的预定义列的Pandas DataFrame：

df=pd.DataFrame([],columns=['a','b'])
df['a']=pd.Categorical([],categories=[0,1])

new_df=pd.DataFrame.from_dict({'a':[1],'b':[0]})
df.append(new_df)

The above throws me an error: 上面给我一个错误：

ValueError: all the input arrays must have same number of dimensions

Update: if the categories are strings as opposed to ints, appending seems to work: 更新：如果类别是字符串而不是整数，则追加似乎有效：

df['a']=pd.Categorical([],categories=['Left','Right'])

new_df=pd.DataFrame.from_dict({'a':['Left'],'b':[0]})
df.append(new_df)

So, how do I append to DataFrames with categories of int values? 那么，如何使用int值类别附加到DataFrame？ Secondly, I presumed that with binary values (0/1), storing the column as Categorical instead of numeric datatype would be more efficient or faster. 其次，我假设使用二进制值（0/1），将列存储为Categorical而不是numeric数据类型将更有效或更快。 Is this true? 这是真的？ If not, I may not even bother to convert my columns to Categorical type. 如果没有，我可能甚至懒得将我的列转换为分类类型。

Answer 1

You have to keep the both data frames consistent. 您必须保持两个数据帧一致。 As you are converting the column a from first data frame as categorical, you need do the same for second data frame. 在将列a从第一个数据帧转换为分类时，您需要对第二个数据帧执行相同操作。 You can do it as following- 你可以这样做 -

import pandas as pd

df=pd.DataFrame([],columns=['a', 'b'])
df['a']=pd.Categorical([],[0, 1])

new_df=pd.DataFrame.from_dict({'a':[0,1,1,1,0,0],'b':[1,1,8,4,0,0]})
new_df['a'] = pd.Categorical(new_df['a'],[0, 1])

df.append(new_df, ignore_index=True)

Hope this helps. 希望这可以帮助。

使用分类列附加到Pandas DataFrame

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-02-09 21:11:21

使用分类列附加到Pandas DataFrame

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-02-09 21:11:21

解决方案1
1 已采纳 2017-02-09 21:11:21