How to Execute the script only under specific conditions using python

Question

I have been working on a script where performing a cleaning script for various columns.

I have to process those script if it undergoes an specific condition.

For Eg.

if flag = 'Not feasible':
    "Process the remaining steps"

Input Data:

name   age   Contact    col4   col5  col6  flag

NKJ    48!   96754789   8886H  AHBZ        Not feasible
Tom    27    98468300   ^686H  ANKZ        feasible
Mike   28@   78915359   3256H  AK9Z        Not feasible
NKJ    48!   96754789   8886H  AHBZ        Not feasible

JKN8    35   96451188   3566H  NK4Z        Not feasible

I am looking to process all the cleaning scripts only if flag= Not Feasible.

Script I am trying to use:

if flag == 'Not feasible':
  df['age'] = df['age'].replace('[^\d.]', '', regex=True).astype(float)
  df[['col4','col5']] = df[['col4','col5']].apply(lambda x: x.astype(str).str.replace('\W',''))
  df['contact'] = df['contact'].replace('[^\d.]', '', regex=True).astype(float)

Like this we have several more rows we are executing, But don't understand how to execute only if flag == Not feasible.

while using the above condition like this getting the error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Please suggest

Answer 1

In order to save you effort on amending large number of scripts for cleaning various columns, you do it in these steps:

firstly extract those not for processing into another dataframe,
re-define df with the extracted rows for processing with a copy
run your cleaning scripts with original syntax
concat those not for processing rows (from step 1) back to the cleaned results (from step 3) with .sort_index() to restore their original sequence.

df1 = df.loc[df['flag'] != 'Not feasible']               # Step 1
df = df.loc[df['flag'] == 'Not feasible'].copy()         # Step 2

# Run your cleaning codes with original syntax           # Step 3
df['age'] = df['age'].replace('[^\d.]', '', regex=True).astype(float)
df[['col4','col5']] = df[['col4','col5']].apply(lambda x: x.astype(str).str.replace('\W',''))
df['Contact'] = df['Contact'].replace('[^\d.]', '', regex=True).astype(float)

df = pd.concat([df, df1]).sort_index()                   # Step 4

Result:

print(df)


   name   age     Contact   col4  col5 col6          flag
0   NKJ  48.0  96754789.0  8886H  AHBZ       Not feasible
1   Tom    27  98468300.0  ^686H  ANKZ           feasible
2  Mike  28.0  78915359.0  3256H  AK9Z       Not feasible
3   NKJ  48.0  96754789.0  8886H  AHBZ       Not feasible
4  JKN8  35.0  96451188.0  3566H  NK4Z       Not feasible

Non-processed data combined back to cleaned data. Original row sequence maintained with .sort_index() after pd.concat()

Answer 2

Did you try to filter using a boolean mask? Eg df.loc[df["flag"]=="Not feasible", 'age'] = df.loc[df["flag"]=="Not feasible", 'age'].replace('[^\d.]', '', regex=True).astype(float) Similarly for all the other transformations you wish to apply to the df.

How to Execute the script only under specific conditions using python

Question

2 answers

solution1
1 ACCPTED 2021-05-05 19:17:19

solution2
0 2021-05-05 18:44:32

How to Execute the script only under specific conditions using python

Question

2 answers

solution1 1 ACCPTED 2021-05-05 19:17:19

solution2 0 2021-05-05 18:44:32

solution1
1 ACCPTED 2021-05-05 19:17:19

solution2
0 2021-05-05 18:44:32