简体   繁体   English

将 fit_transform 与 OneHotEncoder 一起使用时出现 Memory 错误

[英]Memory error when using fit_transform with OneHotEncoder

I am trying to One Hot Encode categorical columns in my dataset.我正在尝试对我的数据集中的分类列进行热编码。 I am using the following function:我正在使用以下 function:

def create_ohe(df, col):
    le = LabelEncoder()
    a = le.fit_transform(df_new[col]).reshape(-1,1)
    ohe = OneHotEncoder(sparse=False)
    column_names = [col + "_" + str(i) for i in le.classes_]
    return (pd.DataFrame(ohe.fit_transform(a), columns=column_names))

I am getting MemoryError when I call the function in this loop:在此循环中调用 function 时出现 MemoryError:

for column in categorical_columns:
    temp_df = create_ohe(df_new, column)
    temp = pd.concat([temp, temp_df], axis=1)

Error Traceback:错误回溯:

MemoryError                               Traceback (most recent call last)
<ipython-input-40-9b241e8bf9e6> in <module>
      1 for column in categorical_columns:
----> 2     temp_df = create_ohe(df_new, column)
      3     temp = pd.concat([temp, temp_df], axis=1)
      4 print("\nShape of final df after one hot encoding: ", temp.shape)

<ipython-input-34-1530423fdf06> in create_ohe(df, col)
      8     ohe = OneHotEncoder(sparse=False)
      9     column_names = [col + "_" + str(i) for i in le.classes_]
---> 10     return (pd.DataFrame(ohe.fit_transform(a), columns=column_names))

MemoryError: 

Ah memory error means that either your computer is at the maximum use of your memory (RAM) or that python is at the maximum: Memory errors and list limits? Ah memory error means that either your computer is at the maximum use of your memory (RAM) or that python is at the maximum: Memory errors and list limits?

you could try to split the a = le.fit_transform(df_new[col]).reshape(-1,1) method.您可以尝试拆分a = le.fit_transform(df_new[col]).reshape(-1,1)方法。 Try to run b= le.fit(df_new[col]) so that you are fitting your label encoder with the full dataset, and then you could split it that you do not transform it for every row at the same time, maybe this helps.尝试运行b= le.fit(df_new[col])以便将 label 编码器与完整数据集相匹配,然后您可以拆分它,不要同时为每一行转换它,也许这有帮助. If b= le.fit(df_new[col]) is also not working, you have a memory problem, the col you have the replace with your column names.如果b= le.fit(df_new[col])也不起作用,则您有 memory 问题,您可以将col替换为列名。

fit_transform is a combination of fit and transform . fit_transformfittransform的组合。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 使用 fit_transform() 和 transform() - Using fit_transform() and transform() 具有 fit_transform 错误的列转换器 - Column Transformer with fit_transform error 使用sklearn时python中的fit,transform和fit_transform有什么区别? - What is difference between fit, transform and fit_transform in python when using sklearn? 使用 fit_transform 思想的归一化函数 - Normalization function using fit_transform idea 在StandardScalar Fit_Transform上获取错误 - Getting Error on StandardScalar Fit_Transform PolynomialFeatures fit_transform给出值错误 - PolynomialFeatures fit_transform is giving Value error 当我们使用transform得到相同的output时为什么要使用fit_transform方法 - Why should we use the fit_transform method when we get the same output using transform 在 piepline 中使用特征选择和 ML model 时,如何确保 sklearn piepline 应用 fit_transform 方法? - How to be sure that sklearn piepline applies fit_transform method when using feature selection and ML model in piepline? 使用 Pickle 加载保存的 model - 在加载的程序中完成 fit_transform 时出现错误 - Loading saved model using Pickle - getting error as fit_transform is done in loaded program 数据框fit_transform抛出错误,看似错误 - Dataframe fit_transform throwing error with seemingly incorrect error
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM