[英]Memory error when using fit_transform with OneHotEncoder
I am trying to One Hot Encode categorical columns in my dataset.我正在尝试对我的数据集中的分类列进行热编码。 I am using the following function:我正在使用以下 function:
def create_ohe(df, col):
le = LabelEncoder()
a = le.fit_transform(df_new[col]).reshape(-1,1)
ohe = OneHotEncoder(sparse=False)
column_names = [col + "_" + str(i) for i in le.classes_]
return (pd.DataFrame(ohe.fit_transform(a), columns=column_names))
I am getting MemoryError when I call the function in this loop:在此循环中调用 function 时出现 MemoryError:
for column in categorical_columns:
temp_df = create_ohe(df_new, column)
temp = pd.concat([temp, temp_df], axis=1)
Error Traceback:错误回溯:
MemoryError Traceback (most recent call last)
<ipython-input-40-9b241e8bf9e6> in <module>
1 for column in categorical_columns:
----> 2 temp_df = create_ohe(df_new, column)
3 temp = pd.concat([temp, temp_df], axis=1)
4 print("\nShape of final df after one hot encoding: ", temp.shape)
<ipython-input-34-1530423fdf06> in create_ohe(df, col)
8 ohe = OneHotEncoder(sparse=False)
9 column_names = [col + "_" + str(i) for i in le.classes_]
---> 10 return (pd.DataFrame(ohe.fit_transform(a), columns=column_names))
MemoryError:
Ah memory error means that either your computer is at the maximum use of your memory (RAM) or that python is at the maximum: Memory errors and list limits? Ah memory error means that either your computer is at the maximum use of your memory (RAM) or that python is at the maximum: Memory errors and list limits?
you could try to split the a = le.fit_transform(df_new[col]).reshape(-1,1)
method.您可以尝试拆分a = le.fit_transform(df_new[col]).reshape(-1,1)
方法。 Try to run b= le.fit(df_new[col])
so that you are fitting your label encoder with the full dataset, and then you could split it that you do not transform it for every row at the same time, maybe this helps.尝试运行b= le.fit(df_new[col])
以便将 label 编码器与完整数据集相匹配,然后您可以拆分它,不要同时为每一行转换它,也许这有帮助. If b= le.fit(df_new[col])
is also not working, you have a memory problem, the col
you have the replace with your column names.如果b= le.fit(df_new[col])
也不起作用,则您有 memory 问题,您可以将col
替换为列名。
fit_transform
is a combination of fit
and transform
. fit_transform
是fit
和transform
的组合。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.