[英]new column in pandas DataFrame based on unique values (lists) of an existing column
[英]Pandas create new column based on first unique values of existing column
我正在嘗試向數據框中添加一個新列,其中只有來自現有列的唯一值。 新列中的行可能會減少,其中 np.nan 值可能會出現重復項。
import pandas as pd
import numpy as np
df = pd.DataFrame({'a':[1,2,3,4,5], 'b':[3,4,3,4,5]})
df
a b
0 1 3
1 2 4
2 3 3
3 4 4
4 5 5
目標:
a b c
0 1 3 3
1 2 4 4
2 3 3 nan
3 4 4 nan
4 5 5 5
我試過了:
df['c'] = np.where(df['b'].unique(), df['b'], np.nan)
它拋出: operands could not be broadcast together with shapes (3,) (5,) ()
mask
+ duplicated
您可以使用 Pandas 方法來屏蔽系列:
df['c'] = df['b'].mask(df['b'].duplicated())
print(df)
a b c
0 1 3 3.0
1 2 4 4.0
2 3 3 NaN
3 4 4 NaN
4 5 5 5.0
與np.where
duplicated
使用:
df['c'] = np.where(df['b'].duplicated(),np.nan,df['b'])
或者:
df['c'] = df['b'].where(~df['b'].duplicated(),np.nan)
print(df)
a b c
0 1 3 3.0
1 2 4 4.0
2 3 3 NaN
3 4 4 NaN
4 5 5 5.0
ppg 寫道:
df['c'] = df['b'].mask(df['b'].duplicated())
print(df)
a b c
0 1 3 3.0
1 2 4 4.0
2 3 3 NaN
3 4 4 NaN
4 5 5 5.0
我喜歡代碼,但最后一列也應該給出 NaN
0 1 3 3.0
1 2 4 4.0
2 3 3 NaN
3 4 4 NaN
4 5 5 NaN
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.