简体   繁体   中英

Compare multiple cells in openpyxl

I need to make a comparison of multiple cells in openpyxl but I have not been successful. To be more precise, I have an .xlsx file that I import into my python script, which contains 4 columns, and around 70,000 rows. The rows that have the same first 3 columns, must be joined and add the digit that appears in the fourth column.

For example

Row 1 .. Type of material: A | Location: NY | Month of sale: January | Cost: 100

..

Row 239 Type of material: A | Location: NY | Month of sale: January | Cost: 150

..

Row 1020 Type of material: A | Location: NY | Month of sale: January | Cost: 80

..

etc

Assuming that only such matches existed, a new data table must be generated (for example in a data sheet) where only one row appears in this way:

Type of material: A | Location: NY | Month of sale: January | Cost: 330 (sum of costs)

And so on, with all the data in .xlsx file to get a new consolidated table.

I hope to have been clear with the explanation, but if it was not, I can be even more precise if necessary.

As I mentioned at the beginning, I have not been successful so far, so I will appreciate any help!

Thank you very much

instead of reading it via openpyxl , I would use pandas

import pandas as pd

raw_data = pd.read_excel(filename, header=0)
summary = raw_data.groupby(['Type of material', 'Location', 'Month of sale'])['Cost'].sum()

If this raises some KeyError s you'll need to fix the labels

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM