简体   繁体   中英

Excel formula in Python takes too long to process

I am trying to apply an excel formula that compares sheets 1 & 2 to determine available price updates, once determined applies the new price on sheet 3.

The formula is referenced in this article: https://www.get-digital-help.com/automate-excel-update-list-with-new-values-array-formula .

The script takes too long 15-20 minutes to process and doesn't apply the formula down the sheet.

Please help!

import openpyxl

wb = openpyxl.load_workbook('New-Price.xlsx', read_only=False, keep_vba=True)
ws = wb['Sheet3']



for i, cellObj in enumerate(ws['C2:C'+str(ws.max_row)],2):
    cellObj[0].value = "=IF(ISERROR(MATCH(A2, Sheet2!$A$2:$A$2001, 0)), INDEX(Sheet1!$B$2:$B$5000, MATCH(A2, Sheet1!$A$2:$A$5000, 0)), INDEX(Sheet2!$B$2:$B$2001, MATCH(A2, Sheet2!$A$2:$A$2001, 0)))".format(i)
    wb.save('New-Price.xlsm')

Assuming you've excel files looking like this -

1st File

product price
0 product1 $10
1 product2 $20
2 product3 $30
3 product4 $40

2nd File

product price
0 product1 20
1 product2 10

Notice the price column in the 1st file contains $character. We need to do some preprocessing to remove that dollar sign. Let's load these 2 excel files into pandas dataframes.

df1 = pd.read_excel('<replace the string with file1 path>', dtype=str)
df2 = pd.read_excel('<replace the string with file2 path>', dtype=str)

Let's preprocess 1st dataframe now and convert the datatype of the price column in both the dataframes-

df1['price'] = df1['price'].str.replace('$', '').astype(float) #do this processing when the price column contains dollar sign.
df2['price'] = df2['price'].astype(float)

Now, dataframe 1 is an old dataframe that contains all the products, and old prices dataframe 2 contains fewer products with their updated prices. So, we can do a left merge.

merged_df = df1.merge(df2, on='product', how='left').rename(columns={"price_x": "old_price", "price_y": "new_price"})

This is the output after the merge -

|   | product  | old_price | new_price |
|---|----------|-----------|-----------|
| 0 | product1 | 10        | 20.0      |
| 1 | product2 | 20        | 10.0      |
| 2 | product3 | 30        | NaN       |
| 3 | product4 | 40        | NaN       |

Now we can subtract column 'new_price' with column 'old_price' to get the final_price.

merged_df['final_price'] = merged_df['new_price'] - merged_df['old_price']

The final dataframe will look like this -

product old_price new_price final_price
0 product1 10 20.0 10.0
1 product2 20 10.0 -10.0
2 product3 30 NaN NaN
3 product4 40 NaN NaN

you can save this into excel using -

merged_df.to_excel('<path to your output file>', index=False)

NOTE: you can drop the useless columns before saving.

Amazing.!! Thanks for putting this together so quickly. Are you able to adjust the Final price so that it defaults to the new price and applies the old price to the corresponding (final price) blank cells

In order words the expected output would look like this.

product old_price new_price final_price
0 product1 10 20.0 20.0
1 product2 20 10.0 10.0
2 product3 30 NaN 30
3 product4 40 NaN 40

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM