I have been working on a script where performing a cleaning script for various columns.
I have to process those script if it undergoes an specific condition.
For Eg.
if flag = 'Not feasible':
"Process the remaining steps"
Input Data:
name age Contact col4 col5 col6 flag
NKJ 48! 96754789 8886H AHBZ Not feasible
Tom 27 98468300 ^686H ANKZ feasible
Mike 28@ 78915359 3256H AK9Z Not feasible
NKJ 48! 96754789 8886H AHBZ Not feasible
JKN8 35 96451188 3566H NK4Z Not feasible
I am looking to process all the cleaning scripts only if flag= Not Feasible.
Script I am trying to use:
if flag == 'Not feasible':
df['age'] = df['age'].replace('[^\d.]', '', regex=True).astype(float)
df[['col4','col5']] = df[['col4','col5']].apply(lambda x: x.astype(str).str.replace('\W',''))
df['contact'] = df['contact'].replace('[^\d.]', '', regex=True).astype(float)
Like this we have several more rows we are executing, But don't understand how to execute only if flag == Not feasible.
while using the above condition like this getting the error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Please suggest
In order to save you effort on amending large number of scripts for cleaning various columns, you do it in these steps:
df
with the extracted rows for processing with a copy.sort_index()
to restore their original sequence.df1 = df.loc[df['flag'] != 'Not feasible'] # Step 1
df = df.loc[df['flag'] == 'Not feasible'].copy() # Step 2
# Run your cleaning codes with original syntax # Step 3
df['age'] = df['age'].replace('[^\d.]', '', regex=True).astype(float)
df[['col4','col5']] = df[['col4','col5']].apply(lambda x: x.astype(str).str.replace('\W',''))
df['Contact'] = df['Contact'].replace('[^\d.]', '', regex=True).astype(float)
df = pd.concat([df, df1]).sort_index() # Step 4
Result:
print(df)
name age Contact col4 col5 col6 flag
0 NKJ 48.0 96754789.0 8886H AHBZ Not feasible
1 Tom 27 98468300.0 ^686H ANKZ feasible
2 Mike 28.0 78915359.0 3256H AK9Z Not feasible
3 NKJ 48.0 96754789.0 8886H AHBZ Not feasible
4 JKN8 35.0 96451188.0 3566H NK4Z Not feasible
Non-processed data combined back to cleaned data. Original row sequence maintained with .sort_index()
after pd.concat()
Did you try to filter using a boolean mask? Eg df.loc[df["flag"]=="Not feasible", 'age'] = df.loc[df["flag"]=="Not feasible", 'age'].replace('[^\d.]', '', regex=True).astype(float)
Similarly for all the other transformations you wish to apply to the df.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.