Pandas create new column based on first unique values of existing column

Question

I'm trying to add a new column to a dataframe with only unique values from an existing column. There will be fewer rows in the new column maybe with np.nan values where duplicates would have been.

import pandas as pd
import numpy as np

df = pd.DataFrame({'a':[1,2,3,4,5], 'b':[3,4,3,4,5]})
df

    a   b
0   1   3
1   2   4
2   3   3
3   4   4
4   5   5

Goal:

    a   b   c
0   1   3   3
1   2   4   4
2   3   3   nan
3   4   4   nan
4   5   5   5

I've tried:

df['c'] = np.where(df['b'].unique(), df['b'], np.nan)

It throws: operands could not be broadcast together with shapes (3,) (5,) ()

Answer 1

`mask` + `duplicated`

You can use Pandas methods for masking a series:

df['c'] = df['b'].mask(df['b'].duplicated())

print(df)

   a  b    c
0  1  3  3.0
1  2  4  4.0
2  3  3  NaN
3  4  4  NaN
4  5  5  5.0

Answer 2

Use duplicated with np.where :

df['c'] = np.where(df['b'].duplicated(),np.nan,df['b'])

Or:

df['c'] = df['b'].where(~df['b'].duplicated(),np.nan)

print(df)
   a  b    c
0  1  3  3.0
1  2  4  4.0
2  3  3  NaN
3  4  4  NaN
4  5  5  5.0

Answer 3

ppg wrote:

df['c'] = df['b'].mask(df['b'].duplicated())

print(df)

   a  b    c
0  1  3  3.0
1  2  4  4.0
2  3  3  NaN
3  4  4  NaN
4  5  5  5.0

I like the code, but the last column should also give NaN

    0  1  3  3.0
    1  2  4  4.0
    2  3  3  NaN
    3  4  4  NaN
    4  5  5  NaN

Pandas create new column based on first unique values of existing column

Question

3 answers

solution1
3 ACCPTED 2018-11-14 17:42:13

`mask` + `duplicated`

solution2
2 2018-11-14 17:43:29

solution3
0 2018-11-14 18:03:16

Pandas create new column based on first unique values of existing column

Question

3 answers

solution1 3 ACCPTED 2018-11-14 17:42:13

mask + duplicated

solution2 2 2018-11-14 17:43:29

solution3 0 2018-11-14 18:03:16

solution1
3 ACCPTED 2018-11-14 17:42:13

`mask` + `duplicated`

solution2
2 2018-11-14 17:43:29

solution3
0 2018-11-14 18:03:16