loop through the dataframe in Python

Question

I have a dataframe as follow:

    c1     c2   c3  c4  c5  c6  c7
0   li      1   2   1   3   2   4
1   qian    2   3   3   5   4   2
2   qian    3   5   4   3   2   4
3   li      5   23  23  2   5   2
4   li      2   5   1   4   2   4
5   zhou    3   5   1   1   1   2

I am trying to create a new column c8 that returns the grouped mean. The group method is:

groupby('c1')['c2'].transform('mean')  ---c2 can be replaced by c3 to c7

My current code looks as below:

lst = [c1, c2, c3, c4,c5, c6, c7]
for i in range(len(lst)):
    res = df.groupby(df['c1'])[i].transform('mean')
    return res
df['c8'] = df[res]

The error says it cannot find c1. Can anyone tell me how do I generate the grouped mean and make this loop works?

Answer 1

There's a few problems here:

The error you're receiving is because you've put variables in your list lst . These should be strings (surrounded by quotes)
You're iterating over the index of lst not the items of lst itself- eg for each iteration of your for-loop, your iterator i is 1 then 2 then 3 , not "c1" "c2" "c3"
You have a return statement inside of your for-loop . There are almost 0 reasons to ever put a return statement in a for-loop because it stops the loop entirely.
You can simply update the dataframe on each iteration of the loop, instead of storing it into a temporary res variable

A working example of your for-loop method would look like this

lst = ["c2", "c3", "c4", "c5", "c6", "c7"]
for column in lst:
    df[column] = df.groupby("c1")[column].transform('mean')

print(df)
     c1        c2  c3        c4  c5  c6        c7
0    li  2.666667  10  8.333333   3   3  3.333333
1  qian  2.500000   4  3.500000   4   3  3.000000
2  qian  2.500000   4  3.500000   4   3  3.000000
3    li  2.666667  10  8.333333   3   3  3.333333
4    li  2.666667  10  8.333333   3   3  3.333333
5  zhou  3.000000   5  1.000000   1   1  2.000000

Even better though, you can supply all of the columns you want to calculate the mean of at once without having to explicitly loop:

lst = ["c2", "c3", "c4", "c5", "c6", "c7"]
average_df = df.groupby("c1")[lst].transform("mean") 

print(average_df)
         c2    c3        c4   c5   c6        c7
0  2.666667  10.0  8.333333  3.0  3.0  3.333333
1  2.500000   4.0  3.500000  4.0  3.0  3.000000
2  2.500000   4.0  3.500000  4.0  3.0  3.000000
3  2.666667  10.0  8.333333  3.0  3.0  3.333333
4  2.666667  10.0  8.333333  3.0  3.0  3.333333
5  3.000000   5.0  1.000000  1.0  1.0  2.000000

loop through the dataframe in Python

Question

1 answers

solution1
1 ACCPTED 2021-02-12 22:44:11

loop through the dataframe in Python

Question

1 answers

solution1 1 ACCPTED 2021-02-12 22:44:11

solution1
1 ACCPTED 2021-02-12 22:44:11