Best structure for replacing a dictionary of tuples

Question

I have a requirement to read a huge CSV with the following structure:

Product Category, Length, Width, Height, Weight
category1,4,4,3,100
category2,5,2,3,150
category1,9,3,3,150
category3,2,2,2,50

After reading this CSV, I need to get the average measurements for each category. So I thought, as a first step before average calculations, about doing a loop reading the csv row by row, and copy its values to dictionary of tuple, where each key would be a category, and each tuple would be a set of sums of each measurement plus total of products in each category. Something like this:

category1: (2,13,7,6,250)
category2: (1,5,2,3,150)
category3: (1,2,2,2,50)

I'm quite new to Python, so I didn't realize until now that tuples are inmutable, so that would not allow me to update a dictionary tuple value when finding new measurements for a category already in there. My question is: for this kind of requirement, what data structure would you recommend? And how would you set and update these measurements?

Answer 1

You could get the average for each category by using the pandas groupby() function (and pandas DataFrames ).

import pandas as pd

df = pd.read_csv( FILE_NAME.csv )
averages_df = df.groupby(by=["Product Category"]).mean()

This will create a DataFrame with as many rows as unique values of Product Category and then take the average of the remaining columns for each category.

If your data looks like this:

>>> df
      Product Category  Weight  Price
0            Fruit       1      2
1        Vegetable       2      3
2            Fruit       3      6

then averages_df will look like this:

>>> averages_df
                  Weight  Price
Product Category               
Fruit                  2      4
Vegetable              2      3

and to access the means for a specific category you can locate by index.

>>> averages_df.loc["Fruit"]
Weight    2
Price     4

To access the mean for a specific category and column you can locate by index and column.

>>> averages_df.loc["Fruit","Price"]
4

Answer 2

You can use pandas and iterrows and parse the data as you wish:

import pandas as pd

data = {"Product": ["category1","category2","category3"],
        "Category": [4,5,9], "Length": [4,2,3], "Width": [3,3,3]}
df = pd.DataFrame(data=data)
parsed_data = {}
for index, row in df.iterrows():
    parsed_data[row["Product"]] = (row["Category"], row["Length"], row["Width"])
print(parsed_data)

Outputs:

{'category1': (4, 4, 3), 'category2': (5, 2, 3), 'category3': (9, 3, 3)}

Best structure for replacing a dictionary of tuples

Question

2 answers

solution1
3 ACCPTED 2021-04-29 14:36:26

solution2
1 2021-04-29 14:29:48

Best structure for replacing a dictionary of tuples

Question

2 answers

solution1 3 ACCPTED 2021-04-29 14:36:26

solution2 1 2021-04-29 14:29:48

solution1
3 ACCPTED 2021-04-29 14:36:26

solution2
1 2021-04-29 14:29:48