從列 dataframe 中的字符串中刪除單詞

Question

我有一個像這樣的 dataframe：

Num           Text 
1        15 March 2020 - There was...
2        15 March 2020 - There has been...
3        24 April 2018 - Nothing has ...
4        07 November 2014 - The Kooks....
...

我想從 Text 中的每一行中刪除前 4 個單詞（即15 March 2020 -, 15 March 2020 -, ...）。 我試過了

df['Text']=df['Text'].str.replace(' ', )但我不知道我應該在括號中包含什么來用空格（或什么都沒有）替換這些值。

Answer 1

您可以使用str.split執行此操作：

考慮到你的 df 是：

In [1193]: df = pd.DataFrame({'Num':[1,2,3,4], 'Text':['15 March 2020 - There was','15 March 2020 - There has been','24 April 2018 - Nothing has','07 November 2014 - The Kooks']})

In [1194]: df
Out[1194]: 
   Num                            Text
0    1       15 March 2020 - There was
1    2  15 March 2020 - There has been
2    3     24 April 2018 - Nothing has
3    4    07 November 2014 - The Kooks

In [1207]: df['Text'].str.split().str[4:].apply(' '.join)                                                                                                                                                
Out[1207]: 
0         There was
1    There has been
2       Nothing has
3         The Kooks
Name: Text, dtype: object

Answer 2

可能有用的是使用 split 命令將其拆分為單詞，然后使用 [4:] 獲取第 4 個單詞之后的任何內容

Answer 3

Python 可以實現不同的正則表達式，例如四個單詞str.replace("\d* \d* \d* \d*", '')這是一個鏈接，可以了解有關 python 正則表達式以及如何檢測不同模式的更多信息在字符串中。

Answer 4

您將df.str.split與df.str.slice一起使用。

df['test'].str.split(n=4).str[-1]

Answer 5

即使它不那么優雅，我更喜歡將“.find()”與“.apply()”一起使用。 無論發生什么，“.find”第一個“-”都將被視為分隔符。

t = pd.DataFrame({'Num':[1,2,3,4], 'Text':['15 March 2020 - There was','15 March 2020 - There has been','24 April 2018 - Nothing has','07 November 2014 - The Kooks']})

t["text2"] = t.apply(lambda x: x['Text'][str(x['Text']).find("- ")+2:], axis=1)

這個：

Num           Text 
1        15 March 2020 - There was...
2        15 March 2020 - There has been...
3        24 April 2018 - Nothing has ...
4        07 November 2014 - The Kooks....

變成這樣：

   Num                            Text           text2
0    1       15 March 2020 - There was       There was
1    2  15 March 2020 - There has been  There has been
2    3     24 April 2018 - Nothing has     Nothing has
3    4    07 November 2014 - The Kooks       The Kooks

從列 dataframe 中的字符串中刪除單詞

問題描述

5 個解決方案

解決方案1
0 2020-05-20 21:00:24

解決方案2
0 2020-05-20 21:01:55

解決方案3
0 2020-05-20 21:03:46

解決方案4
0 已采納 2020-05-20 21:04:17

解決方案5
0 2020-05-20 21:15:57

從列 dataframe 中的字符串中刪除單詞

問題描述

5 個解決方案

解決方案1 0 2020-05-20 21:00:24

解決方案2 0 2020-05-20 21:01:55

解決方案3 0 2020-05-20 21:03:46

解決方案4 0 已采納 2020-05-20 21:04:17

解決方案5 0 2020-05-20 21:15:57

解決方案1
0 2020-05-20 21:00:24

解決方案2
0 2020-05-20 21:01:55

解決方案3
0 2020-05-20 21:03:46

解決方案4
0 已采納 2020-05-20 21:04:17

解決方案5
0 2020-05-20 21:15:57