简体   繁体   中英

How to Execute the script only under specific conditions using python

I have been working on a script where performing a cleaning script for various columns.

I have to process those script if it undergoes an specific condition.

For Eg.

if flag = 'Not feasible':
    "Process the remaining steps"

Input Data:

name   age   Contact    col4   col5  col6  flag

NKJ    48!   96754789   8886H  AHBZ        Not feasible
Tom    27    98468300   ^686H  ANKZ        feasible
Mike   28@   78915359   3256H  AK9Z        Not feasible
NKJ    48!   96754789   8886H  AHBZ        Not feasible

JKN8    35   96451188   3566H  NK4Z        Not feasible

I am looking to process all the cleaning scripts only if flag= Not Feasible.

Script I am trying to use:

if flag == 'Not feasible':
  df['age'] = df['age'].replace('[^\d.]', '', regex=True).astype(float)
  df[['col4','col5']] = df[['col4','col5']].apply(lambda x: x.astype(str).str.replace('\W',''))
  df['contact'] = df['contact'].replace('[^\d.]', '', regex=True).astype(float)

Like this we have several more rows we are executing, But don't understand how to execute only if flag == Not feasible.

while using the above condition like this getting the error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Please suggest

In order to save you effort on amending large number of scripts for cleaning various columns, you do it in these steps:

  1. firstly extract those not for processing into another dataframe,
  2. re-define df with the extracted rows for processing with a copy
  3. run your cleaning scripts with original syntax
  4. concat those not for processing rows (from step 1) back to the cleaned results (from step 3) with .sort_index() to restore their original sequence.

df1 = df.loc[df['flag'] != 'Not feasible']               # Step 1
df = df.loc[df['flag'] == 'Not feasible'].copy()         # Step 2

# Run your cleaning codes with original syntax           # Step 3
df['age'] = df['age'].replace('[^\d.]', '', regex=True).astype(float)
df[['col4','col5']] = df[['col4','col5']].apply(lambda x: x.astype(str).str.replace('\W',''))
df['Contact'] = df['Contact'].replace('[^\d.]', '', regex=True).astype(float)

df = pd.concat([df, df1]).sort_index()                   # Step 4

Result:

print(df)


   name   age     Contact   col4  col5 col6          flag
0   NKJ  48.0  96754789.0  8886H  AHBZ       Not feasible
1   Tom    27  98468300.0  ^686H  ANKZ           feasible
2  Mike  28.0  78915359.0  3256H  AK9Z       Not feasible
3   NKJ  48.0  96754789.0  8886H  AHBZ       Not feasible
4  JKN8  35.0  96451188.0  3566H  NK4Z       Not feasible

Non-processed data combined back to cleaned data. Original row sequence maintained with .sort_index() after pd.concat()

Did you try to filter using a boolean mask? Eg df.loc[df["flag"]=="Not feasible", 'age'] = df.loc[df["flag"]=="Not feasible", 'age'].replace('[^\d.]', '', regex=True).astype(float) Similarly for all the other transformations you wish to apply to the df.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM