[英]Removing words from strings within a column dataframe
我有一個像這樣的 dataframe:
Num Text
1 15 March 2020 - There was...
2 15 March 2020 - There has been...
3 24 April 2018 - Nothing has ...
4 07 November 2014 - The Kooks....
...
我想從 Text 中的每一行中刪除前 4 個單詞(即15 March 2020 -, 15 March 2020 -,
...)。 我試過了
df['Text']=df['Text'].str.replace(' ', )
但我不知道我應該在括號中包含什么來用空格(或什么都沒有)替換這些值。
您可以使用str.split
執行此操作:
考慮到你的 df 是:
In [1193]: df = pd.DataFrame({'Num':[1,2,3,4], 'Text':['15 March 2020 - There was','15 March 2020 - There has been','24 April 2018 - Nothing has','07 November 2014 - The Kooks']})
In [1194]: df
Out[1194]:
Num Text
0 1 15 March 2020 - There was
1 2 15 March 2020 - There has been
2 3 24 April 2018 - Nothing has
3 4 07 November 2014 - The Kooks
In [1207]: df['Text'].str.split().str[4:].apply(' '.join)
Out[1207]:
0 There was
1 There has been
2 Nothing has
3 The Kooks
Name: Text, dtype: object
可能有用的是使用 split 命令將其拆分為單詞,然后使用 [4:] 獲取第 4 個單詞之后的任何內容
Python 可以實現不同的正則表達式,例如四個單詞str.replace("\d* \d* \d* \d*", '')
這是一個鏈接,可以了解有關 python 正則表達式以及如何檢測不同模式的更多信息在字符串中。
您將df.str.split
與df.str.slice
一起使用。
df['test'].str.split(n=4).str[-1]
即使它不那么優雅,我更喜歡將“.find()”與“.apply()”一起使用。 無論發生什么,“.find”第一個“-”都將被視為分隔符。
t = pd.DataFrame({'Num':[1,2,3,4], 'Text':['15 March 2020 - There was','15 March 2020 - There has been','24 April 2018 - Nothing has','07 November 2014 - The Kooks']})
t["text2"] = t.apply(lambda x: x['Text'][str(x['Text']).find("- ")+2:], axis=1)
這個:
Num Text
1 15 March 2020 - There was...
2 15 March 2020 - There has been...
3 24 April 2018 - Nothing has ...
4 07 November 2014 - The Kooks....
變成這樣:
Num Text text2
0 1 15 March 2020 - There was There was
1 2 15 March 2020 - There has been There has been
2 3 24 April 2018 - Nothing has Nothing has
3 4 07 November 2014 - The Kooks The Kooks
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.