I need to make a comparison of multiple cells in openpyxl but I have not been successful. To be more precise, I have an .xlsx file that I import into my python script, which contains 4 columns, and around 70,000 rows. The rows that have the same first 3 columns, must be joined and add the digit that appears in the fourth column.
For example
Row 1 .. Type of material: A | Location: NY | Month of sale: January | Cost: 100
..
Row 239 Type of material: A | Location: NY | Month of sale: January | Cost: 150
..
Row 1020 Type of material: A | Location: NY | Month of sale: January | Cost: 80
..
etc
Assuming that only such matches existed, a new data table must be generated (for example in a data sheet) where only one row appears in this way:
Type of material: A | Location: NY | Month of sale: January | Cost: 330 (sum of costs)
And so on, with all the data in .xlsx file to get a new consolidated table.
I hope to have been clear with the explanation, but if it was not, I can be even more precise if necessary.
As I mentioned at the beginning, I have not been successful so far, so I will appreciate any help!
Thank you very much
instead of reading it via openpyxl
, I would use pandas
import pandas as pd
raw_data = pd.read_excel(filename, header=0)
summary = raw_data.groupby(['Type of material', 'Location', 'Month of sale'])['Cost'].sum()
If this raises some KeyError
s you'll need to fix the labels
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.