簡體   English   中英

從列 dataframe 中的字符串中刪除單詞

[英]Removing words from strings within a column dataframe

我有一個像這樣的 dataframe:

Num           Text 
1        15 March 2020 - There was...
2        15 March 2020 - There has been...
3        24 April 2018 - Nothing has ...
4        07 November 2014 - The Kooks....
...

我想從 Text 中的每一行中刪除前 4 個單詞(即15 March 2020 -, 15 March 2020 -, ...)。 我試過了

df['Text']=df['Text'].str.replace(' ', )但我不知道我應該在括號中包含什么來用空格(或什么都沒有)替換這些值。

您可以使用str.split執行此操作:

考慮到你的 df 是:

In [1193]: df = pd.DataFrame({'Num':[1,2,3,4], 'Text':['15 March 2020 - There was','15 March 2020 - There has been','24 April 2018 - Nothing has','07 November 2014 - The Kooks']})

In [1194]: df
Out[1194]: 
   Num                            Text
0    1       15 March 2020 - There was
1    2  15 March 2020 - There has been
2    3     24 April 2018 - Nothing has
3    4    07 November 2014 - The Kooks

In [1207]: df['Text'].str.split().str[4:].apply(' '.join)                                                                                                                                                
Out[1207]: 
0         There was
1    There has been
2       Nothing has
3         The Kooks
Name: Text, dtype: object

可能有用的是使用 split 命令將其拆分為單詞,然后使用 [4:] 獲取第 4 個單詞之后的任何內容

Python 可以實現不同的正則表達式,例如四個單詞str.replace("\d* \d* \d* \d*", '')這是一個鏈接,可以了解有關 python 正則表達式以及如何檢測不同模式的更多信息在字符串中。

您將df.str.splitdf.str.slice一起使用。

df['test'].str.split(n=4).str[-1]

即使它不那么優雅,我更喜歡將“.find()”與“.apply()”一起使用。 無論發生什么,“.find”第一個“-”都將被視為分隔符。

t = pd.DataFrame({'Num':[1,2,3,4], 'Text':['15 March 2020 - There was','15 March 2020 - There has been','24 April 2018 - Nothing has','07 November 2014 - The Kooks']})

t["text2"] = t.apply(lambda x: x['Text'][str(x['Text']).find("- ")+2:], axis=1)

這個:

Num           Text 
1        15 March 2020 - There was...
2        15 March 2020 - There has been...
3        24 April 2018 - Nothing has ...
4        07 November 2014 - The Kooks....

變成這樣:

   Num                            Text           text2
0    1       15 March 2020 - There was       There was
1    2  15 March 2020 - There has been  There has been
2    3     24 April 2018 - Nothing has     Nothing has
3    4    07 November 2014 - The Kooks       The Kooks

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM