[英]Randomly Filling NaN values of a Column with Non-Null String Values
我正在使用以下包含.str
值的 DataFrame
maturity_rating
0 NaN
1 Rated: 18+ (R)
2 Rated: 7+ (PG)
3 NaN
4 Rated: 18+ (R)
我正在嘗試用同一列中存在的其他非空值隨機填充 NaN 值
我預期的 output 是:
maturity_rating
0 Rated: 7+ (PG)
1 Rated: 18+ (R)
2 Rated: 7+ (PG)
3 Rated: 18+ (R)
4 Rated: 18+ (R)
我嘗試使用以下代碼段
df["maturity_rating"].fillna(lambda x: random.choice(df[df['maturity_rating'] != np.nan]["maturity_rating"]), inplace =True)
但是,當我檢查唯一值時,它會用 lambda object 填充 NaN
df["maturity_rating"].unique()
Out[117]:
array([<function <lambda> at 0x7fe8d0431a60>, 'Rated: 18+ (R)',
'Rated: 7+ (PG)', 'Rated: 13+ (PG-13)', 'Rated: All (G)',
'Rated: 16+'], dtype=object)
請指教
讓我們試試np.random.choice
:
m = df['maturity_rating'].isna()
df.loc[m, 'maturity_rating'] = np.random.choice(df.loc[~m, 'maturity_rating'], m.sum())
細節:
使用Series.isna
創建一個 boolean 掩碼,指定maturity_column
列包含NaN
值的條件:
>>> m
0 True
1 False
2 False
3 True
4 False
Name: maturity_rating, dtype: bool
使用 boolean 索引和反向掩碼m
到 select 來自maturity_rating
等級列的非NaN
元素,然后使用np.random.choice
隨機抽樣這些元素:
>>> df.loc[~m, 'maturity_rating']
1 Rated: 18+ (R)
2 Rated: 7+ (PG)
4 Rated: 18+ (R)
Name: maturity_rating, dtype: object
>>> np.random.choice(df.loc[~m, 'maturity_rating'], m.sum())
array(['Rated: 18+ (R)', 'Rated: 7+ (PG)'], dtype=object)
最后使用 boolean 索引用上述采樣選項填充maturity_rating
評級列中的NaN
值:
>>> df
maturity_rating
0 Rated: 18+ (R)
1 Rated: 18+ (R)
2 Rated: 7+ (PG)
3 Rated: 18+ (R)
4 Rated: 18+ (R)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.