Trying to execute an Excel dataset within an condition. The dataset keeps on getting updated on daily intervals, therefore i have created a 'Flag' column. Whwnever New data is being updated it is marked in Flag column as 'Not Feasible' , when it is processed Flag column gets updated to 'Feasible' . If an entry is marked as 'Not Feasible' means it has not been processed yet and we need to execute the script having Flag value as 'Not Feasible' .
What i need to perform: I only want to execute the cleaning process under for loop(by processing one row at a time) on entries with the 'Not Feasible' Flag column values.
After Successful execution need to concat the executed data (df) + Non Executed Data(df1).
Input Data
name Joining_Date age Contact col4 col5 col6 flag
NKJ 4/26/2021 48! 96754789 8886H AHBZ Not feasible
Tom 26.4.2021 27 98468300 ^686H ANKZ feasible
Mike 2/27/2021 28@ 78915359 3256H AK9Z Not feasible
NKJ 27.2.2021 48! 96754789 8886H AHBZ Not feasible
Adam 2/14/2021 18# 78915899 3256H AK7Z Not feasible
Steve 3/11/2021 23@ 7891HI59 3256H AK5Z feasible
JKN 2/12/2021 35 96451188 3566H NK4Z Not feasible
Script using:
df = pd.read_excel(open(r'data.xlsx', 'rb'), sheet_name='sheet1')
df1 = df.loc[df['flag'] != 'Not feasible']
df = df.loc[df['flag'] == 'Not feasible'].copy()
for index, file in df..iterrows():
# Run your cleaning codes with original syntax
try:
file['Joining_Date'][index] = pd.to_datetime(file['Joining_Date'], errors='coerce')
file['Joining_Date'][index] = file['Joining_Date'].dt.strftime('%Y-%m-%d')
file['age'][index] = file['age'].replace('[^\d.]', '', regex=True).astype(float)
file[['col4','col5']][index] = file[['col4','col5']].apply(lambda x: x.astype(str).str.replace('\W',''))
file['Contact'][index] = file['Contact'].replace('[^\d.]', '', regex=True).astype(float)
file['flag'][index] = "feasible"
except ValueError:
file['status'] = ValueError
df = pd.concat([file, df1]).sort_index()
writer = pd.ExcelWriter(r'data.xlsx', engine='xlsxwriter')
df.to_excel(writer, sheet_name='sheet1', index=False)
writer.save()
Error:df = pd.concat([file, df1]).sort_index()
TypeError: '<' not supported between instances of 'int' and 'str'
Expected Output:
name Joining_Date age Contact col4 col5 col6 flag
NKJ 2021-4-26 48 96754789 8886H AHBZ feasible
Tom 26.4.2021 27 98468300 ^686H ANKZ feasible
Mike 2021-2-27 28 78915359 3256H AK9Z feasible
NKJ 2021-2-27 48 96754789 8886H AHBZ feasible
Adam 2021-2-14 18 78915899 3256H AK7Z feasible
Steve 3/11/2021 23@ 7891HI59 3256H AK5Z feasible
JKN 2021-2-12 35 96451188 3566H NK4Z feasible
Please Suggest.
You can completely remove looping and use:
df1 = df.loc[df['flag'] != 'Not feasible']
df = df.loc[df['flag'] == 'Not feasible'].copy()
df['Joining_Date'] = pd.to_datetime(df['Joining_Date'], errors='coerce')
df['Joining_Date'] = df['Joining_Date'].dt.strftime('%Y-%m-%d')
df['age'] = df['age'].replace('[^\d.]', '', regex=True).astype(float)
df[['col4','col5']] = df[['col4','col5']].apply(lambda x: x.astype(str).str.replace('\W','', regex=True))
df['Contact'] = df['Contact'].replace('[^\d.]', '', regex=True).astype(float)
df['flag'] = "feasible"
df = pd.concat([df, df1]).sort_index()
EDIT: Loop solution is possible, but for replace is used re.sub
, because working with scalars:
df1 = df.loc[df['flag'] != 'Not feasible']
df = df.loc[df['flag'] == 'Not feasible'].copy()
import re
def test(x):
try:
x['Joining_Date'] = pd.to_datetime(x['Joining_Date'], errors='coerce')
x['Joining_Date'] = x['Joining_Date'].strftime('%Y-%m-%d')
x['age'] = float(re.sub('[^\d.]', '',x['age']))
x['col4'] = re.sub('\W', '',str(x['col4']))
x['col5'] = re.sub('\W', '', str(x['col5']))
x['Contact'] = float(re.sub('[^\d.]', '',x['Contact']))
x['flag'] = "feasible"
except ValueError:
x['status'] = ValueError
return x
df = df.apply(test, axis=1)
df = pd.concat([df, df1]).sort_index()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.