How to perform calculations on a subset of a column in a pandas dataframe?

Question

With a dataset such as this :

    famid  birth  age   ht
0       1      1  one  2.8
1       1      1  two  3.4
2       1      2  one  2.9
3       1      2  two  3.8
4       1      3  one  2.2
5       1      3  two  2.9

...where we've got values for a variable ht for different categories of, for example, age , I would like to adjust a subset of the data in df['ht'] where df['age'] == 'one' only . And I would like to do it without creating a new column.

I've tried:

df[df['age']=='one']['ht'] = df[df['age']=='one']['ht']*10**6

But to my mild surprise the numbers don't change. Maybe because the A value is trying to be set on a copy of a slice from a DataFrame warning is triggered in the same run. I've also tried with df.mask() and df.where() . But to no avail. I'm clearly failing at something very basic here, but I'd really like to know how to do this properly. There are similarly sounding questions such as Performing calculations on subset of data frame subset in Python , but the suggested solutions here are pointing towards df.groupby() , and I don't think this necessarily is the right approach here.

Thank you for any suggestions!

Here's a fully reproducible dataset:

import pandas as pd

df = pd.DataFrame({
    'famid': [1, 1, 1, 2, 2, 2, 3, 3, 3],
    'birth': [1, 2, 3, 1, 2, 3, 1, 2, 3],
    'ht_one': [2.8, 2.9, 2.2, 2, 1.8, 1.9, 2.2, 2.3, 2.1],
    'ht_two': [3.4, 3.8, 2.9, 3.2, 2.8, 2.4, 3.3, 3.4, 2.9]
})
df = pd.wide_to_long(df, stubnames='ht', i=['famid', 'birth'], j='age',
                    sep='_', suffix=r'\w+')
df.reset_index(inplace = True)

Answer 1

Let's try this:

df.loc[df['age'] == 'one', 'ht'] *= 10**6

Output:

    famid  birth  age         ht
0       1      1  one  2800000.0
1       1      1  two        3.4
2       1      2  one  2900000.0
3       1      2  two        3.8
4       1      3  one  2200000.0
5       1      3  two        2.9
6       2      1  one  2000000.0
7       2      1  two        3.2
8       2      2  one  1800000.0
9       2      2  two        2.8
10      2      3  one  1900000.0
11      2      3  two        2.4
12      3      1  one  2200000.0
13      3      1  two        3.3
14      3      2  one  2300000.0
15      3      2  two        3.4
16      3      3  one  2100000.0
17      3      3  two        2.9

Answer 2

Here is a way:

df.assign(ht = df['ht'].mask(df['age'].isin(['one']),df['ht'].mul(10**6)))

by using isin() , more values from the age column can be added.

Output:

    famid  birth  age         ht
0       1      1  one  2800000.0
1       1      1  two        3.4
2       1      2  one  2900000.0
3       1      2  two        3.8
4       1      3  one  2200000.0
5       1      3  two        2.9
6       2      1  one  2000000.0
7       2      1  two        3.2
8       2      2  one  1800000.0
9       2      2  two        2.8
10      2      3  one  1900000.0
11      2      3  two        2.4
12      3      1  one  2200000.0
13      3      1  two        3.3
14      3      2  one  2300000.0
15      3      2  two        3.4
16      3      3  one  2100000.0
17      3      3  two        2.9

How to perform calculations on a subset of a column in a pandas dataframe?

Question

2 answers

solution1
2 ACCPTED 2022-12-03 22:41:49

solution2
1 2022-12-03 22:47:03

How to perform calculations on a subset of a column in a pandas dataframe?

Question

2 answers

solution1 2 ACCPTED 2022-12-03 22:41:49

solution2 1 2022-12-03 22:47:03

solution1
2 ACCPTED 2022-12-03 22:41:49

solution2
1 2022-12-03 22:47:03