I have two dataframes DFa
and DFb
. DFa
contains 4 columns: Date
, macro_A
, macro_B
, macro_C
. Whereas, DFb
contains 3 columns: Name
, Region
, Transformation
.
What I am trying to achieve is that I wish to check if the column names of DFa
are contained within DFb.Name
; if yes, then I will extract the equivalent Transformation
method from DFb
. Depending on what the transformation method is, I will appropriately transform the DFa
column.
DFa = pd.DataFrame({'Date' : [2010, 2011, 2012, 2013],
'macro_A' : [0.23, 0.20, 0.13, 0.19],
'macro_B' : [0.23, 0.20, 0.13, 0.19],
'macro_C' : [0.23, 0.20, 0.13, 0.19]}, index = [1, 2, 3, 4])
DFb = pd.DataFrame({'Name' : ['macro_C', 'macro_B', 'macro_D', 'macro_A', 'macro_E'],
'Region' : ['UK', 'UK', 'US', 'UK', 'EUR'],
'Transformation' : ['non', 'STD', 'STD', 'STD', 'non']},
index = [1, 2, 3, 4, 5])
For example, I check that macro_A
column from DFa
exists within DFb.Name
. Then, I check that from DFb.Transformation
the value is STD
, which means I need to transform (standardize) the DFa.macro_A
.
On the other hand, I check macro_C
from DFa
exists within DFb.Name
, but DFb.Transformation
for macro_C
is non
. Therefore, I leave DFa.macro_C
as it stands.
I have build this code
for j, k in enumerate(DFa.columns):
for i, x in enumerate(DFb['Name']):
if x == k:
if DFb.ix[i, 'Transformation'] == 'STD':
DFa.iloc[:, j] = preprocessing.scale(DFa.iloc[: j])
How can I make my code more efficient?
Following corrected code works:
from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler()
for j, k in enumerate(DFa.columns):
for i, x in enumerate(DFb.Name):
if x == k and DFb.iloc[i,:]['Transformation'] == 'STD':
DFa.iloc[:,j] = min_max_scaler.fit_transform(DFa.iloc[:,j])
print(DFa)
Output:
...some DEPRECATION_MSG warnings...
Date macro_A macro_B macro_C
1 2010 1.0 1.0 0.23
2 2011 0.7 0.7 0.20
3 2012 0.0 0.0 0.13
4 2013 0.6 0.6 0.19
macro_A and macro_B have been scaled but not macro_C.
I think you can avoid the enumerate
and iloc
by using the column names. I will also suggest using a string->lambda
map to store the operations and use the apply
function. It will help when you have multiple operation strings
operations = {'STD': lambda x : min_max_scaler.fit_transform(x),
'non': lambda x : x} # Operations map
for colName in DFa.columns.values:
transformStr = DFb.Transformation[DFb.Name == colName] #Get the transform string by matching column name with Name column
if transformStr.shape[0] > 1 or transformStr.shape[0] == 0: # Make sure that only one operation is selected
raise(Exception('Invalid transform string %s',transformStr))
DFa[colName] = DFa[colName].apply(operations[transformStr.iloc[0]])
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.