簡體   English   中英

用非空字符串值隨機填充列的 NaN 值

[英]Randomly Filling NaN values of a Column with Non-Null String Values

我正在使用以下包含.str值的 DataFrame

maturity_rating
0   NaN
1   Rated: 18+ (R)
2   Rated: 7+ (PG)
3   NaN
4   Rated: 18+ (R)

我正在嘗試用同一列中存在的其他非空值隨機填充 NaN 值

我預期的 output 是:

maturity_rating
0   Rated: 7+ (PG)
1   Rated: 18+ (R)
2   Rated: 7+ (PG)
3   Rated: 18+ (R)
4   Rated: 18+ (R)

我嘗試使用以下代碼段

df["maturity_rating"].fillna(lambda x: random.choice(df[df['maturity_rating'] != np.nan]["maturity_rating"]), inplace =True)

但是,當我檢查唯一值時,它會用 lambda object 填充 NaN


df["maturity_rating"].unique()

Out[117]:
array([<function <lambda> at 0x7fe8d0431a60>, 'Rated: 18+ (R)',
       'Rated: 7+ (PG)', 'Rated: 13+ (PG-13)', 'Rated: All (G)',
       'Rated: 16+'], dtype=object)

請指教

讓我們試試np.random.choice

m = df['maturity_rating'].isna()
df.loc[m, 'maturity_rating'] = np.random.choice(df.loc[~m, 'maturity_rating'], m.sum())

細節:

使用Series.isna創建一個 boolean 掩碼,指定maturity_column列包含NaN值的條件:

>>> m

0     True
1    False
2    False
3     True
4    False
Name: maturity_rating, dtype: bool

使用 boolean 索引和反向掩碼m到 select 來自maturity_rating等級列的非NaN元素,然后使用np.random.choice隨機抽樣這些元素:

>>> df.loc[~m, 'maturity_rating']

1    Rated: 18+ (R)
2    Rated: 7+ (PG)
4    Rated: 18+ (R)
Name: maturity_rating, dtype: object

>>> np.random.choice(df.loc[~m, 'maturity_rating'], m.sum())

array(['Rated: 18+ (R)', 'Rated: 7+ (PG)'], dtype=object)

最后使用 boolean 索引用上述采樣選項填充maturity_rating評級列中的NaN值:

>>> df

  maturity_rating
0  Rated: 18+ (R)
1  Rated: 18+ (R)
2  Rated: 7+ (PG)
3  Rated: 18+ (R)
4  Rated: 18+ (R)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM