简体   繁体   English

在for循环python中编码和保存多个数组

[英]Encoding and saving multiple arrays in a for loop python

I am preprocessing data for a deep neural network and I have variables that I need to one hot encode. 我正在预处理一个深层神经网络的数据,并且有一些需要热编码的变量。 So far, this is what I am doing and it is working fine. 到目前为止,这是我正在做的事情,并且工作正常。 However, I was wondering if I could implement this in a for loop as that may be more efficient? 但是,我想知道是否可以在for循环中实现它,因为这样可能更有效?

# Only Educational Establishment Type
X6 = X.drop(['Sex', 'Applicant Domicile (High Level)', 'Applicant Domicile (Low Level)', 'Age Band (5 Levels)', 'POLAR3 Quintile', 'Subject Group (Detailed Level)', 'Subject Group (Summary Level)'], axis=1)
X6 = onehotencoder.fit_transform((X6).apply(encoder.fit_transform)).toarray()
# print(X6.head)

# Only Subject Group (Detailed Level)
X7 = X.drop(['Sex', 'Applicant Domicile (High Level)', 'Applicant Domicile (Low Level)', 'Age Band (5 Levels)', 'POLAR3 Quintile', 'Educational Establishment Type', 'Subject Group (Summary Level)'], axis=1)
X7 = onehotencoder.fit_transform((X7).apply(encoder.fit_transform)).toarray()
# print(X7.head)

# Only Subject Group (Summary Level)
X8 = X.drop(['Sex', 'Applicant Domicile (High Level)', 'Applicant Domicile (Low Level)', 'Age Band (5 Levels)', 'POLAR3 Quintile', 'Educational Establishment Type', 'Subject Group (Detailed Level)'], axis=1)
X8 = onehotencoder.fit_transform((X8).apply(encoder.fit_transform)).toarray()

I also need to save the encoded array in a npy format to recall later. 我还需要将编码后的数组保存为npy格式,以便稍后调用。 I attempted to implement all of this in a for loop as follows; 我试图在如下的for循环中实现所有这些功能; however, it does not save a file for each array as desired and it does not actually update the existing dataframes into one hot encoded arrarys. 但是,它不会为每个数组保存所需的文件,并且实际上不会将现有数据帧更新为一个热编码数组。

all_x = [X, X1, X2, X3, X4, X5, X6, X7, X8]

idx = 0
for num_x in all_x: 
   encoder=LabelEncoder()
   (num_x) = (num_x).apply(encoder.fit_transform)   
   onehotencoder = OneHotEncoder(categorical_features='all')
   (num_x) = onehotencoder.fit_transform(num_x).toarray()
   np.save('X%d' % idx, num_x)
   idx+=1
   print(num_x)

You're actually not updating the existing dataframes, in either version; 您实际上并没有更新任何一个版本中的现有数据框。 you're just creating new dataframes. 您只是在创建新的数据框。

The difference is that in the repetitive code, you're rebinding each of the variables X , X1 , etc. to each of those new dataframes, while in your loop, you're rebinding num_x over and over again to each new dataframe, so at the end, only the last one is stored anywhere. 区别在于,在重复代码中,您将每个变量XX1等重新绑定到每个新数据帧,而在循环中,您一次又一次将num_x重新绑定到每个新数据帧,因此最后,只有最后一个存储在任何地方。

Ideally, you probably want to rewrite your code to not have those separate variables in the first place, and only have a list of 9 dataframes—or maybe a dict of 9 dataframes keyed by names X , X1 , etc., if the names really are meaningful. 理想情况下,您可能希望重写代码,使其一开始就没有那些单独的变量,而只包含9个数据帧的列表,或者如果名称确实是9,则可能是由名称XX1等键控的9个数据帧的字典是有意义的。

But if that's too big of a change, you can do something like this: 但是,如果更改太大,您可以执行以下操作:

all_x = [X, X1, X2, X3, X4, X5, X6, X7, X8]
idx = 0
for num_x in all_x:
   encoder=LabelEncoder()
   (num_x) = (num_x).apply(encoder.fit_transform)   
   onehotencoder = OneHotEncoder(categorical_features='all')
   # store the new dataframe back into the list
   all_x[idx] = num_x = onehotencoder.fit_transform(num_x).toarray()
   np.save('X%d' % idx, num_x)
   idx+=1
   print(num_x)
# store the list of new dataframes back into the original variable names
X, X1, X2, X3, X4, X5, X6, X7, X8 = all_x

There are ways you can make this cleaner (eg, using for idx, num_x in enumerate(all_x): ), but I stuck as closely as possible to your existing code. 您可以通过多种方法使它更干净(例如, for idx, num_x in enumerate(all_x): ,但是我尽可能地与现有代码保持一致。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM