简体   繁体   中英

Python: If column contains string, then extract another column's value

I have two dataframes DFa and DFb . DFa contains 4 columns: Date , macro_A , macro_B , macro_C . Whereas, DFb contains 3 columns: Name , Region , Transformation .

What I am trying to achieve is that I wish to check if the column names of DFa are contained within DFb.Name ; if yes, then I will extract the equivalent Transformation method from DFb . Depending on what the transformation method is, I will appropriately transform the DFa column.

DFa = pd.DataFrame({'Date' : [2010, 2011, 2012, 2013],
'macro_A' : [0.23, 0.20, 0.13, 0.19], 
'macro_B' : [0.23, 0.20, 0.13, 0.19], 
'macro_C' : [0.23, 0.20, 0.13, 0.19]}, index = [1, 2, 3, 4])

DFb = pd.DataFrame({'Name' : ['macro_C', 'macro_B', 'macro_D', 'macro_A', 'macro_E'],
'Region' : ['UK', 'UK', 'US', 'UK', 'EUR'], 
'Transformation' : ['non', 'STD', 'STD', 'STD', 'non']}, 
 index = [1, 2, 3, 4, 5])

For example, I check that macro_A column from DFa exists within DFb.Name . Then, I check that from DFb.Transformation the value is STD , which means I need to transform (standardize) the DFa.macro_A .

On the other hand, I check macro_C from DFa exists within DFb.Name , but DFb.Transformation for macro_C is non . Therefore, I leave DFa.macro_C as it stands.

I have build this code

for j, k in enumerate(DFa.columns):
    for i, x in enumerate(DFb['Name']):
        if x == k:
            if DFb.ix[i, 'Transformation'] == 'STD':
                DFa.iloc[:, j] = preprocessing.scale(DFa.iloc[: j])

How can I make my code more efficient?

Following corrected code works:

from sklearn import preprocessing
min_max_scaler = preprocessing.MinMaxScaler()
for j, k in enumerate(DFa.columns):
    for i, x in enumerate(DFb.Name):
        if x == k and DFb.iloc[i,:]['Transformation'] == 'STD':
            DFa.iloc[:,j] = min_max_scaler.fit_transform(DFa.iloc[:,j])

print(DFa)

Output:

...some DEPRECATION_MSG warnings...
   Date  macro_A  macro_B  macro_C
1  2010      1.0      1.0     0.23
2  2011      0.7      0.7     0.20
3  2012      0.0      0.0     0.13
4  2013      0.6      0.6     0.19

macro_A and macro_B have been scaled but not macro_C.

I think you can avoid the enumerate and iloc by using the column names. I will also suggest using a string->lambda map to store the operations and use the apply function. It will help when you have multiple operation strings

operations = {'STD': lambda x : min_max_scaler.fit_transform(x),
              'non': lambda x : x} # Operations map 

for colName in DFa.columns.values:
    transformStr = DFb.Transformation[DFb.Name == colName] #Get the transform string by matching column name with Name column

    if transformStr.shape[0] > 1 or transformStr.shape[0] == 0: # Make sure that only one operation is selected
        raise(Exception('Invalid transform string %s',transformStr))

    DFa[colName] = DFa[colName].apply(operations[transformStr.iloc[0]])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM