简体   繁体   English

使用分类列附加到Pandas DataFrame

[英]Appending to Pandas DataFrame with categorical columns

How do I append to a Pandas DataFrame containing predefined columns of categorical datatype: 如何附加到包含分类数据类型的预定义列的Pandas DataFrame:

df=pd.DataFrame([],columns=['a','b'])
df['a']=pd.Categorical([],categories=[0,1])

new_df=pd.DataFrame.from_dict({'a':[1],'b':[0]})
df.append(new_df)

The above throws me an error: 上面给我一个错误:

ValueError: all the input arrays must have same number of dimensions

Update: if the categories are strings as opposed to ints, appending seems to work: 更新:如果类别是字符串而不是整数,则追加似乎有效:

df['a']=pd.Categorical([],categories=['Left','Right'])

new_df=pd.DataFrame.from_dict({'a':['Left'],'b':[0]})
df.append(new_df)

So, how do I append to DataFrames with categories of int values? 那么,如何使用int值类别附加到DataFrame? Secondly, I presumed that with binary values (0/1), storing the column as Categorical instead of numeric datatype would be more efficient or faster. 其次,我假设使用二进制值(0/1),将列存储为Categorical而不是numeric数据类型将更有效或更快。 Is this true? 这是真的? If not, I may not even bother to convert my columns to Categorical type. 如果没有,我可能甚至懒得将我的列转换为分类类型。

You have to keep the both data frames consistent. 您必须保持两个数据帧一致。 As you are converting the column a from first data frame as categorical, you need do the same for second data frame. 在将列a从第一个数据帧转换为分类时,您需要对第二个数据帧执行相同操作。 You can do it as following- 你可以这样做 -

import pandas as pd

df=pd.DataFrame([],columns=['a', 'b'])
df['a']=pd.Categorical([],[0, 1])

new_df=pd.DataFrame.from_dict({'a':[0,1,1,1,0,0],'b':[1,1,8,4,0,0]})
new_df['a'] = pd.Categorical(new_df['a'],[0, 1])

df.append(new_df, ignore_index=True)

Hope this helps. 希望这可以帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM