用非空字符串值隨機填充列的 NaN 值

Question

我正在使用以下包含.str值的 DataFrame

maturity_rating
0   NaN
1   Rated: 18+ (R)
2   Rated: 7+ (PG)
3   NaN
4   Rated: 18+ (R)

我正在嘗試用同一列中存在的其他非空值隨機填充 NaN 值

我預期的 output 是：

maturity_rating
0   Rated: 7+ (PG)
1   Rated: 18+ (R)
2   Rated: 7+ (PG)
3   Rated: 18+ (R)
4   Rated: 18+ (R)

我嘗試使用以下代碼段

df["maturity_rating"].fillna(lambda x: random.choice(df[df['maturity_rating'] != np.nan]["maturity_rating"]), inplace =True)

但是，當我檢查唯一值時，它會用 lambda object 填充 NaN


df["maturity_rating"].unique()

Out[117]:
array([<function <lambda> at 0x7fe8d0431a60>, 'Rated: 18+ (R)',
       'Rated: 7+ (PG)', 'Rated: 13+ (PG-13)', 'Rated: All (G)',
       'Rated: 16+'], dtype=object)

請指教

Answer 1

讓我們試試np.random.choice ：

m = df['maturity_rating'].isna()
df.loc[m, 'maturity_rating'] = np.random.choice(df.loc[~m, 'maturity_rating'], m.sum())

細節：

使用Series.isna創建一個 boolean 掩碼，指定maturity_column列包含NaN值的條件：

>>> m

0     True
1    False
2    False
3     True
4    False
Name: maturity_rating, dtype: bool

使用 boolean 索引和反向掩碼m到 select 來自maturity_rating等級列的非NaN元素，然后使用np.random.choice隨機抽樣這些元素：

>>> df.loc[~m, 'maturity_rating']

1    Rated: 18+ (R)
2    Rated: 7+ (PG)
4    Rated: 18+ (R)
Name: maturity_rating, dtype: object

>>> np.random.choice(df.loc[~m, 'maturity_rating'], m.sum())

array(['Rated: 18+ (R)', 'Rated: 7+ (PG)'], dtype=object)

最后使用 boolean 索引用上述采樣選項填充maturity_rating評級列中的NaN值：

>>> df

  maturity_rating
0  Rated: 18+ (R)
1  Rated: 18+ (R)
2  Rated: 7+ (PG)
3  Rated: 18+ (R)
4  Rated: 18+ (R)

用非空字符串值隨機填充列的 NaN 值

問題描述

1 個解決方案

解決方案1
4 已采納 2021-02-28 07:17:56

用非空字符串值隨機填充列的 NaN 值

問題描述

1 個解決方案

解決方案1 4 已采納 2021-02-28 07:17:56

解決方案1
4 已采納 2021-02-28 07:17:56