如何使用pandas.DataFrame.replace（regex = regex）替換不匹配正則表達式的任何值？

Question

我正在使用熊貓導入和清理數據集。 我有一列名為“ rating”的列，其格式應與此正則表達式匹配：

regex = "(\d\.\d out of 5 stars)"

這樣可行。 我想知道的是如何只替換不匹配正則表達式的值。 pandas.DataFrame.replace(to_replace=regex, value=np.nan, regex=True)工作方式相反-它替換了所有與正則表達式匹配的東西。 是否缺少某些語法或完成任務的其他方法？ 提前致謝。

Answer 1

您可以在contains布爾索引的情況下使用IIUC：

df = pd.DataFrame({'rating':['2.2 out of 5 stars', 'no rating', 'rating is 3.3 out of 5 stars', '?']})
regex = "(\d\.\d out of 5 stars)"

df[~df.rating.str.contains(regex)] = np.nan

結果：

                         rating
0            2.2 out of 5 stars
1                           NaN
2  rating is 3.3 out of 5 stars
3                           NaN

如果要extract評級，可以這樣進行：

 df.rating = df.rating.str.extract(regex)

結果：

  rating 0 2.2 out of 5 stars 1 NaN 2 3.3 out of 5 stars 3 NaN

如何使用pandas.DataFrame.replace（regex = regex）替換不匹配正則表達式的任何值？

問題描述

1 個解決方案

解決方案1
3 2019-09-08 13:26:21

如何使用pandas.DataFrame.replace（regex = regex）替換不匹配正則表達式的任何值？

問題描述

1 個解決方案

解決方案1 3 2019-09-08 13:26:21

解決方案1
3 2019-09-08 13:26:21