简体   繁体   中英

How to perform calculations on a subset of a column in a pandas dataframe?

With a dataset such as this :

    famid  birth  age   ht
0       1      1  one  2.8
1       1      1  two  3.4
2       1      2  one  2.9
3       1      2  two  3.8
4       1      3  one  2.2
5       1      3  two  2.9

...where we've got values for a variable ht for different categories of, for example, age , I would like to adjust a subset of the data in df['ht'] where df['age'] == 'one' only . And I would like to do it without creating a new column.

I've tried:

df[df['age']=='one']['ht'] = df[df['age']=='one']['ht']*10**6

But to my mild surprise the numbers don't change. Maybe because the A value is trying to be set on a copy of a slice from a DataFrame warning is triggered in the same run. I've also tried with df.mask() and df.where() . But to no avail. I'm clearly failing at something very basic here, but I'd really like to know how to do this properly. There are similarly sounding questions such as Performing calculations on subset of data frame subset in Python , but the suggested solutions here are pointing towards df.groupby() , and I don't think this necessarily is the right approach here.

Thank you for any suggestions!

Here's a fully reproducible dataset:

import pandas as pd

df = pd.DataFrame({
    'famid': [1, 1, 1, 2, 2, 2, 3, 3, 3],
    'birth': [1, 2, 3, 1, 2, 3, 1, 2, 3],
    'ht_one': [2.8, 2.9, 2.2, 2, 1.8, 1.9, 2.2, 2.3, 2.1],
    'ht_two': [3.4, 3.8, 2.9, 3.2, 2.8, 2.4, 3.3, 3.4, 2.9]
})
df = pd.wide_to_long(df, stubnames='ht', i=['famid', 'birth'], j='age',
                    sep='_', suffix=r'\w+')
df.reset_index(inplace = True)

Let's try this:

df.loc[df['age'] == 'one', 'ht'] *= 10**6

Output:

    famid  birth  age         ht
0       1      1  one  2800000.0
1       1      1  two        3.4
2       1      2  one  2900000.0
3       1      2  two        3.8
4       1      3  one  2200000.0
5       1      3  two        2.9
6       2      1  one  2000000.0
7       2      1  two        3.2
8       2      2  one  1800000.0
9       2      2  two        2.8
10      2      3  one  1900000.0
11      2      3  two        2.4
12      3      1  one  2200000.0
13      3      1  two        3.3
14      3      2  one  2300000.0
15      3      2  two        3.4
16      3      3  one  2100000.0
17      3      3  two        2.9

Here is a way:

df.assign(ht = df['ht'].mask(df['age'].isin(['one']),df['ht'].mul(10**6)))

by using isin() , more values from the age column can be added.

Output:

    famid  birth  age         ht
0       1      1  one  2800000.0
1       1      1  two        3.4
2       1      2  one  2900000.0
3       1      2  two        3.8
4       1      3  one  2200000.0
5       1      3  two        2.9
6       2      1  one  2000000.0
7       2      1  two        3.2
8       2      2  one  1800000.0
9       2      2  two        2.8
10      2      3  one  1900000.0
11      2      3  two        2.4
12      3      1  one  2200000.0
13      3      1  two        3.3
14      3      2  one  2300000.0
15      3      2  two        3.4
16      3      3  one  2100000.0
17      3      3  two        2.9

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM