I have a DataFrame with a multiindex. The levels are gender
, type
and lastly age
. where I want to replace the value of one age with another within that group. So I'm guessing i need to use .groupby()
.
Below I present an example of the problem, that I have.
This is the DataFrame I have initially:
Index Gender Type Age Value
0 'f' 'a' 0 'A1'
1 'f' 'a' 1 'A2'
2 'f' 'a' 2 'B1'
3 'f' 'a' 3 'xx'
4 'f' 'a' 4 'B5'
5 'f' 'a' 5 'F3'
6 'f' 'a' 6 'B6'
7 'f' 'a' 7 'Q10'
8 'f' 'a' 8 'A3'
9 'f' 'a' 9 'A1'
10 'f' 'b' 0 'D1'
11 'f' 'b' 1 'V2'
12 'f' 'b' 2 'V1'
13 'f' 'b' 3 'xx'
14 'f' 'b' 4 'G5'
15 'f' 'b' 5 'D3'
16 'f' 'b' 6 'B6'
17 'f' 'b' 7 'Q14'
18 'f' 'b' 8 'A3'
19 'm' 'a' 0 'A1'
20 'm' 'a' 1 'A2'
21 'm' 'a' 2 'B1'
21 'm' 'a' 3 'xx'
23 'm' 'a' 4 'B5'
24 'm' 'a' 5 'A3'
25 'm' 'a' 6 'B6'
26 'm' 'a' 7 'B15'
27 'm' 'a' 8 'A3'
28 'm' 'a' 9 'A1'
29 'm' 'b' 2 'V1'
30 'm' 'b' 3 'xx'
31 'm' 'b' 4 'R5'
32 'm' 'b' 5 'B3'
33 'm' 'b' 6 'W6'
34 'm' 'b' 7 'Q12'
As visible, each row for age==3
, the value is xx
. I want that value replaced with the value of age 7 within each gender-type group.
That is:
Index Gender Type Age Value
0 'f' 'a' 0 'A1'
1 'f' 'a' 1 'A2'
2 'f' 'a' 2 'B1'
3 'f' 'a' 3 'Q10'
4 'f' 'a' 4 'B5'
5 'f' 'a' 5 'F3'
6 'f' 'a' 6 'B6'
7 'f' 'a' 7 'Q10'
8 'f' 'a' 8 'A3'
9 'f' 'a' 9 'A1'
10 'f' 'b' 0 'D1'
11 'f' 'b' 1 'V2'
12 'f' 'b' 2 'V1'
13 'f' 'b' 3 'Q14'
14 'f' 'b' 4 'G5'
15 'f' 'b' 5 'D3'
16 'f' 'b' 6 'B6'
17 'f' 'b' 7 'Q14'
18 'f' 'b' 8 'A3'
19 'm' 'a' 0 'A1'
20 'm' 'a' 1 'A2'
21 'm' 'a' 2 'B1'
21 'm' 'a' 3 'B15'
23 'm' 'a' 4 'B5'
24 'm' 'a' 5 'A3'
25 'm' 'a' 6 'B6'
26 'm' 'a' 7 'B15'
27 'm' 'a' 8 'A3'
28 'm' 'a' 9 'A1'
29 'm' 'b' 2 'V1'
30 'm' 'b' 3 'Q12'
31 'm' 'b' 4 'R5'
32 'm' 'b' 5 'B3'
33 'm' 'b' 6 'W6'
34 'm' 'b' 7 'Q12'
Notice, the DataFrame is not balanced, in the sense that the range of ages within each gender-type group is not the same. It doesn't start and end at the same age, so as age 3 is not the same index within each group I can't use iloc
but rather loc
in some way.
Thanks for your help beforehand.
You can define the custom function that will process each group individually:
def fix(g):
g.loc[g['Age'] == 3, 'Value'] = g.loc[g['Age'] == 7, 'Value'].iloc[0]
return g
df.groupby(['Gender', 'Type']).apply(fix)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.