Compare multiple cells in openpyxl

Question

I need to make a comparison of multiple cells in openpyxl but I have not been successful. To be more precise, I have an .xlsx file that I import into my python script, which contains 4 columns, and around 70,000 rows. The rows that have the same first 3 columns, must be joined and add the digit that appears in the fourth column.

For example

Row 1 .. Type of material: A | Location: NY | Month of sale: January | Cost: 100

..

Row 239 Type of material: A | Location: NY | Month of sale: January | Cost: 150

..

Row 1020 Type of material: A | Location: NY | Month of sale: January | Cost: 80

..

etc

Assuming that only such matches existed, a new data table must be generated (for example in a data sheet) where only one row appears in this way:

Type of material: A | Location: NY | Month of sale: January | Cost: 330 (sum of costs)

And so on, with all the data in .xlsx file to get a new consolidated table.

I hope to have been clear with the explanation, but if it was not, I can be even more precise if necessary.

As I mentioned at the beginning, I have not been successful so far, so I will appreciate any help!

Thank you very much

Answer 1

instead of reading it via openpyxl , I would use pandas

import pandas as pd

raw_data = pd.read_excel(filename, header=0)
summary = raw_data.groupby(['Type of material', 'Location', 'Month of sale'])['Cost'].sum()

If this raises some KeyError s you'll need to fix the labels

Compare multiple cells in openpyxl

Question

1 answers

solution1
0 ACCPTED 2017-10-28 22:02:11

Compare multiple cells in openpyxl

Question

1 answers

solution1 0 ACCPTED 2017-10-28 22:02:11

solution1
0 ACCPTED 2017-10-28 22:02:11