简体   繁体   中英

For loop over dataframe python

i have dataframe called df_civic with columns - state,rank, make/model, model year, thefts . I want to calculate AVG and STD of thefts for each model year .

All years that are in dataframe are taken with: years_civic = list(pd.unique(df_civic['Model Year']))

My loop looks like this:

for civic_year in years_civic:
    f = df_civic['Model Year'] == civic_year
    civic_avg = df_civic[f]['Thefts'].mean()
    civic_std = df_civic[f]['Thefts'].std()
    civic_std= np.round(car_std,2)
    civic_avg= np.round(car_avg,2)
    print(civic_avg, civic_std, np.sum(f))

However output is not what i need, only output that is correct is the one from np.sum(f)

Now output looks like this:

9.0 20.51 1
9.0 20.51 1
9.0 20.51 1
9.0 20.51 1
9.0 20.51 13
9.0 20.51 15
9.0 20.51 3
9.0 20.51 2

Pandas provides many powerful functions for aggregating your data. It's usually better to first think of these functions before using for loops.

For instance, you can use:

import pandas as pd
import numpy as np

df_civic.groupby("Model Year").agg({"theft": ["mean", np.std]})

More doc here: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.agg.html

Regarding your code, there is something weird, car_std and car_avg are not defined.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM