Excel formula in Python takes too long to process

Question

I am trying to apply an excel formula that compares sheets 1 & 2 to determine available price updates, once determined applies the new price on sheet 3.

The formula is referenced in this article: https://www.get-digital-help.com/automate-excel-update-list-with-new-values-array-formula .

The script takes too long 15-20 minutes to process and doesn't apply the formula down the sheet.

Please help!

import openpyxl

wb = openpyxl.load_workbook('New-Price.xlsx', read_only=False, keep_vba=True)
ws = wb['Sheet3']



for i, cellObj in enumerate(ws['C2:C'+str(ws.max_row)],2):
    cellObj[0].value = "=IF(ISERROR(MATCH(A2, Sheet2!$A$2:$A$2001, 0)), INDEX(Sheet1!$B$2:$B$5000, MATCH(A2, Sheet1!$A$2:$A$5000, 0)), INDEX(Sheet2!$B$2:$B$2001, MATCH(A2, Sheet2!$A$2:$A$2001, 0)))".format(i)
    wb.save('New-Price.xlsm')

Answer 1

Assuming you've excel files looking like this -

1st File

	product	price
0	product1	$10
1	product2	$20
2	product3	$30
3	product4	$40

2nd File

	product	price
0	product1	20
1	product2	10

Notice the price column in the 1st file contains $character. We need to do some preprocessing to remove that dollar sign. Let's load these 2 excel files into pandas dataframes.

df1 = pd.read_excel('<replace the string with file1 path>', dtype=str)
df2 = pd.read_excel('<replace the string with file2 path>', dtype=str)

Let's preprocess 1st dataframe now and convert the datatype of the price column in both the dataframes-

df1['price'] = df1['price'].str.replace('$', '').astype(float) #do this processing when the price column contains dollar sign.
df2['price'] = df2['price'].astype(float)

Now, dataframe 1 is an old dataframe that contains all the products, and old prices dataframe 2 contains fewer products with their updated prices. So, we can do a left merge.

merged_df = df1.merge(df2, on='product', how='left').rename(columns={"price_x": "old_price", "price_y": "new_price"})

This is the output after the merge -

|   | product  | old_price | new_price |
|---|----------|-----------|-----------|
| 0 | product1 | 10        | 20.0      |
| 1 | product2 | 20        | 10.0      |
| 2 | product3 | 30        | NaN       |
| 3 | product4 | 40        | NaN       |

Now we can subtract column 'new_price' with column 'old_price' to get the final_price.

merged_df['final_price'] = merged_df['new_price'] - merged_df['old_price']

The final dataframe will look like this -

	product	old_price	new_price	final_price
0	product1	10	20.0	10.0
1	product2	20	10.0	-10.0
2	product3	30	NaN	NaN
3	product4	40	NaN	NaN

you can save this into excel using -

merged_df.to_excel('<path to your output file>', index=False)

NOTE: you can drop the useless columns before saving.

Answer 2

Amazing.!! Thanks for putting this together so quickly. Are you able to adjust the Final price so that it defaults to the new price and applies the old price to the corresponding (final price) blank cells

In order words the expected output would look like this.

	product	old_price	new_price	final_price
0	product1	10	20.0	20.0
1	product2	20	10.0	10.0
2	product3	30	NaN	30
3	product4	40	NaN	40

Excel formula in Python takes too long to process

Question

2 answers

solution1
0 2021-04-13 04:18:48

solution2
0 2021-04-13 17:26:50

Excel formula in Python takes too long to process

Question

2 answers

solution1 0 2021-04-13 04:18:48

solution2 0 2021-04-13 17:26:50

solution1
0 2021-04-13 04:18:48

solution2
0 2021-04-13 17:26:50