How to Execute Query within Flag condition in Python

Question

Trying to execute an Excel dataset within an condition. The dataset keeps on getting updated on daily intervals, therefore i have created a 'Flag' column. Whwnever New data is being updated it is marked in Flag column as 'Not Feasible' , when it is processed Flag column gets updated to 'Feasible' . If an entry is marked as 'Not Feasible' means it has not been processed yet and we need to execute the script having Flag value as 'Not Feasible' .

What i need to perform: I only want to execute the cleaning process under for loop(by processing one row at a time) on entries with the 'Not Feasible' Flag column values.

After Successful execution need to concat the executed data (df) + Non Executed Data(df1).

Input Data

name  Joining_Date      age   Contact    col4   col5  col6  flag

NKJ    4/26/2021        48!   96754789   8886H  AHBZ        Not feasible
Tom    26.4.2021        27    98468300   ^686H  ANKZ        feasible
Mike   2/27/2021        28@   78915359   3256H  AK9Z        Not feasible
NKJ    27.2.2021        48!   96754789   8886H  AHBZ        Not feasible
Adam   2/14/2021        18#   78915899   3256H  AK7Z        Not feasible
Steve  3/11/2021        23@   7891HI59   3256H  AK5Z        feasible
JKN    2/12/2021        35    96451188   3566H  NK4Z        Not feasible

Script using:

df = pd.read_excel(open(r'data.xlsx', 'rb'), sheet_name='sheet1')

df1 = df.loc[df['flag'] != 'Not feasible'] 
df = df.loc[df['flag'] == 'Not feasible'].copy()

for index, file in df..iterrows():
   # Run your cleaning codes with original syntax   
   try:
      file['Joining_Date'][index] = pd.to_datetime(file['Joining_Date'], errors='coerce')
      file['Joining_Date'][index] = file['Joining_Date'].dt.strftime('%Y-%m-%d')
      file['age'][index] = file['age'].replace('[^\d.]', '', regex=True).astype(float)
      file[['col4','col5']][index] = file[['col4','col5']].apply(lambda x: x.astype(str).str.replace('\W',''))
      file['Contact'][index] = file['Contact'].replace('[^\d.]', '', regex=True).astype(float)
      file['flag'][index] = "feasible"
   except ValueError:
      file['status'] = ValueError

   df = pd.concat([file, df1]).sort_index() 
writer = pd.ExcelWriter(r'data.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='sheet1', index=False)
writer.save()

Error:df = pd.concat([file, df1]).sort_index()

TypeError: '<' not supported between instances of 'int' and 'str'

Expected Output:

name  Joining_Date      age   Contact    col4   col5  col6  flag

NKJ    2021-4-26        48    96754789   8886H  AHBZ        feasible
Tom    26.4.2021        27    98468300   ^686H  ANKZ        feasible
Mike   2021-2-27        28    78915359   3256H  AK9Z        feasible
NKJ    2021-2-27        48    96754789   8886H  AHBZ        feasible
Adam   2021-2-14        18    78915899   3256H  AK7Z        feasible
Steve  3/11/2021        23@   7891HI59   3256H  AK5Z        feasible
JKN    2021-2-12        35    96451188   3566H  NK4Z        feasible

Please Suggest.

Answer 1

You can completely remove looping and use:

df1 = df.loc[df['flag'] != 'Not feasible'] 
df = df.loc[df['flag'] == 'Not feasible'].copy()

df['Joining_Date'] = pd.to_datetime(df['Joining_Date'], errors='coerce')
df['Joining_Date'] = df['Joining_Date'].dt.strftime('%Y-%m-%d')
df['age'] = df['age'].replace('[^\d.]', '', regex=True).astype(float)
df[['col4','col5']] = df[['col4','col5']].apply(lambda x: x.astype(str).str.replace('\W','', regex=True))
df['Contact'] = df['Contact'].replace('[^\d.]', '', regex=True).astype(float)
df['flag'] = "feasible"


df = pd.concat([df, df1]).sort_index()

EDIT: Loop solution is possible, but for replace is used re.sub , because working with scalars:

df1 = df.loc[df['flag'] != 'Not feasible'] 
df = df.loc[df['flag'] == 'Not feasible'].copy()

import re
    
def test(x):
    
    try:
        x['Joining_Date'] = pd.to_datetime(x['Joining_Date'], errors='coerce')
        x['Joining_Date'] = x['Joining_Date'].strftime('%Y-%m-%d')
        x['age'] = float(re.sub('[^\d.]', '',x['age']))
        x['col4'] = re.sub('\W', '',str(x['col4']))
        x['col5'] = re.sub('\W', '', str(x['col5']))
        
        x['Contact'] = float(re.sub('[^\d.]', '',x['Contact']))
        x['flag'] = "feasible"
    except ValueError:
        x['status'] = ValueError

    return x


df = df.apply(test, axis=1)

df = pd.concat([df, df1]).sort_index()

How to Execute Query within Flag condition in Python

Question

1 answers

solution1
2 ACCPTED 2021-05-13 10:44:02

How to Execute Query within Flag condition in Python

Question

1 answers

solution1 2 ACCPTED 2021-05-13 10:44:02

solution1
2 ACCPTED 2021-05-13 10:44:02