在for循环python中编码和保存多个数组

Question

I am preprocessing data for a deep neural network and I have variables that I need to one hot encode. 我正在预处理一个深层神经网络的数据，并且有一些需要热编码的变量。 So far, this is what I am doing and it is working fine. 到目前为止，这是我正在做的事情，并且工作正常。 However, I was wondering if I could implement this in a for loop as that may be more efficient? 但是，我想知道是否可以在for循环中实现它，因为这样可能更有效？

# Only Educational Establishment Type
X6 = X.drop(['Sex', 'Applicant Domicile (High Level)', 'Applicant Domicile (Low Level)', 'Age Band (5 Levels)', 'POLAR3 Quintile', 'Subject Group (Detailed Level)', 'Subject Group (Summary Level)'], axis=1)
X6 = onehotencoder.fit_transform((X6).apply(encoder.fit_transform)).toarray()
# print(X6.head)

# Only Subject Group (Detailed Level)
X7 = X.drop(['Sex', 'Applicant Domicile (High Level)', 'Applicant Domicile (Low Level)', 'Age Band (5 Levels)', 'POLAR3 Quintile', 'Educational Establishment Type', 'Subject Group (Summary Level)'], axis=1)
X7 = onehotencoder.fit_transform((X7).apply(encoder.fit_transform)).toarray()
# print(X7.head)

# Only Subject Group (Summary Level)
X8 = X.drop(['Sex', 'Applicant Domicile (High Level)', 'Applicant Domicile (Low Level)', 'Age Band (5 Levels)', 'POLAR3 Quintile', 'Educational Establishment Type', 'Subject Group (Detailed Level)'], axis=1)
X8 = onehotencoder.fit_transform((X8).apply(encoder.fit_transform)).toarray()

I also need to save the encoded array in a npy format to recall later. 我还需要将编码后的数组保存为npy格式，以便稍后调用。 I attempted to implement all of this in a for loop as follows; 我试图在如下的for循环中实现所有这些功能； however, it does not save a file for each array as desired and it does not actually update the existing dataframes into one hot encoded arrarys. 但是，它不会为每个数组保存所需的文件，并且实际上不会将现有数据帧更新为一个热编码数组。

all_x = [X, X1, X2, X3, X4, X5, X6, X7, X8]

idx = 0
for num_x in all_x: 
   encoder=LabelEncoder()
   (num_x) = (num_x).apply(encoder.fit_transform)   
   onehotencoder = OneHotEncoder(categorical_features='all')
   (num_x) = onehotencoder.fit_transform(num_x).toarray()
   np.save('X%d' % idx, num_x)
   idx+=1
   print(num_x)

Answer 1

You're actually not updating the existing dataframes, in either version; 您实际上并没有更新任何一个版本中的现有数据框。 you're just creating new dataframes. 您只是在创建新的数据框。

The difference is that in the repetitive code, you're rebinding each of the variables X , X1 , etc. to each of those new dataframes, while in your loop, you're rebinding num_x over and over again to each new dataframe, so at the end, only the last one is stored anywhere. 区别在于，在重复代码中，您将每个变量X ， X1等重新绑定到每个新数据帧，而在循环中，您一次又一次将num_x重新绑定到每个新数据帧，因此最后，只有最后一个存储在任何地方。

Ideally, you probably want to rewrite your code to not have those separate variables in the first place, and only have a list of 9 dataframes—or maybe a dict of 9 dataframes keyed by names X , X1 , etc., if the names really are meaningful. 理想情况下，您可能希望重写代码，使其一开始就没有那些单独的变量，而只包含9个数据帧的列表，或者如果名称确实是9，则可能是由名称X ， X1等键控的9个数据帧的字典是有意义的。

But if that's too big of a change, you can do something like this: 但是，如果更改太大，您可以执行以下操作：

all_x = [X, X1, X2, X3, X4, X5, X6, X7, X8]
idx = 0
for num_x in all_x:
   encoder=LabelEncoder()
   (num_x) = (num_x).apply(encoder.fit_transform)   
   onehotencoder = OneHotEncoder(categorical_features='all')
   # store the new dataframe back into the list
   all_x[idx] = num_x = onehotencoder.fit_transform(num_x).toarray()
   np.save('X%d' % idx, num_x)
   idx+=1
   print(num_x)
# store the list of new dataframes back into the original variable names
X, X1, X2, X3, X4, X5, X6, X7, X8 = all_x

There are ways you can make this cleaner (eg, using for idx, num_x in enumerate(all_x): ), but I stuck as closely as possible to your existing code. 您可以通过多种方法使它更干净（例如， for idx, num_x in enumerate(all_x): ，但是我尽可能地与现有代码保持一致。

在for循环python中编码和保存多个数组

问题描述

1 个解决方案

解决方案1
0 2018-06-01 00:10:30

在for循环python中编码和保存多个数组

问题描述

1 个解决方案

解决方案1 0 2018-06-01 00:10:30

解决方案1
0 2018-06-01 00:10:30