简体   繁体   中英

Best structure for replacing a dictionary of tuples

I have a requirement to read a huge CSV with the following structure:

Product Category, Length, Width, Height, Weight
category1,4,4,3,100
category2,5,2,3,150
category1,9,3,3,150
category3,2,2,2,50

After reading this CSV, I need to get the average measurements for each category. So I thought, as a first step before average calculations, about doing a loop reading the csv row by row, and copy its values to dictionary of tuple, where each key would be a category, and each tuple would be a set of sums of each measurement plus total of products in each category. Something like this:

category1: (2,13,7,6,250)
category2: (1,5,2,3,150)
category3: (1,2,2,2,50)

I'm quite new to Python, so I didn't realize until now that tuples are inmutable, so that would not allow me to update a dictionary tuple value when finding new measurements for a category already in there. My question is: for this kind of requirement, what data structure would you recommend? And how would you set and update these measurements?

You could get the average for each category by using the pandas groupby() function (and pandas DataFrames ).

import pandas as pd

df = pd.read_csv( FILE_NAME.csv )
averages_df = df.groupby(by=["Product Category"]).mean()

This will create a DataFrame with as many rows as unique values of Product Category and then take the average of the remaining columns for each category.

If your data looks like this:

>>> df
      Product Category  Weight  Price
0            Fruit       1      2
1        Vegetable       2      3
2            Fruit       3      6

then averages_df will look like this:

>>> averages_df
                  Weight  Price
Product Category               
Fruit                  2      4
Vegetable              2      3

and to access the means for a specific category you can locate by index.

>>> averages_df.loc["Fruit"]
Weight    2
Price     4

To access the mean for a specific category and column you can locate by index and column.

>>> averages_df.loc["Fruit","Price"]
4

You can use pandas and iterrows and parse the data as you wish:

import pandas as pd

data = {"Product": ["category1","category2","category3"],
        "Category": [4,5,9], "Length": [4,2,3], "Width": [3,3,3]}
df = pd.DataFrame(data=data)
parsed_data = {}
for index, row in df.iterrows():
    parsed_data[row["Product"]] = (row["Category"], row["Length"], row["Width"])
print(parsed_data)

Outputs:

{'category1': (4, 4, 3), 'category2': (5, 2, 3), 'category3': (9, 3, 3)}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM