Pandas 根据现有列的第一个唯一值创建新列

Question

I'm trying to add a new column to a dataframe with only unique values from an existing column.我正在尝试向数据框中添加一个新列，其中只有来自现有列的唯一值。 There will be fewer rows in the new column maybe with np.nan values where duplicates would have been.新列中的行可能会减少，其中 np.nan 值可能会出现重复项。

import pandas as pd
import numpy as np

df = pd.DataFrame({'a':[1,2,3,4,5], 'b':[3,4,3,4,5]})
df

    a   b
0   1   3
1   2   4
2   3   3
3   4   4
4   5   5

Goal:目标：

    a   b   c
0   1   3   3
1   2   4   4
2   3   3   nan
3   4   4   nan
4   5   5   5

I've tried:我试过了：

df['c'] = np.where(df['b'].unique(), df['b'], np.nan)

It throws: operands could not be broadcast together with shapes (3,) (5,) ()它抛出： operands could not be broadcast together with shapes (3,) (5,) ()

Answer 1

`mask` + `duplicated` `mask` + `duplicated`

You can use Pandas methods for masking a series:您可以使用 Pandas 方法来屏蔽系列：

df['c'] = df['b'].mask(df['b'].duplicated())

print(df)

   a  b    c
0  1  3  3.0
1  2  4  4.0
2  3  3  NaN
3  4  4  NaN
4  5  5  5.0

Answer 2

Use duplicated with np.where :与np.where duplicated使用：

df['c'] = np.where(df['b'].duplicated(),np.nan,df['b'])

Or:或者：

df['c'] = df['b'].where(~df['b'].duplicated(),np.nan)

print(df)
   a  b    c
0  1  3  3.0
1  2  4  4.0
2  3  3  NaN
3  4  4  NaN
4  5  5  5.0

Answer 3

ppg wrote: ppg 写道:

df['c'] = df['b'].mask(df['b'].duplicated())

print(df)

   a  b    c
0  1  3  3.0
1  2  4  4.0
2  3  3  NaN
3  4  4  NaN
4  5  5  5.0

I like the code, but the last column should also give NaN我喜欢代码，但最后一列也应该给出 NaN

    0  1  3  3.0
    1  2  4  4.0
    2  3  3  NaN
    3  4  4  NaN
    4  5  5  NaN

Pandas 根据现有列的第一个唯一值创建新列

问题描述

3 个解决方案

解决方案1
3 已采纳 2018-11-14 17:42:13

`mask` + `duplicated` `mask` + `duplicated`

解决方案2
2 2018-11-14 17:43:29

解决方案3
0 2018-11-14 18:03:16

Pandas 根据现有列的第一个唯一值创建新列

问题描述

3 个解决方案

解决方案1 3 已采纳 2018-11-14 17:42:13

mask + duplicated mask + duplicated

解决方案2 2 2018-11-14 17:43:29

解决方案3 0 2018-11-14 18:03:16

解决方案1
3 已采纳 2018-11-14 17:42:13

`mask` + `duplicated` `mask` + `duplicated`

解决方案2
2 2018-11-14 17:43:29

解决方案3
0 2018-11-14 18:03:16