[英]Extract words from the text by index into a new column Pandas Python
我有數據框:
data = {'text': ['They say that all cats land on their feet, but this does not apply to my cat. He not only often falls, but also jumps badly. We have visited the veterinarian more than once with dislocated paws and damaged internal organs.',
'Mom called dad, and when he came home, he took moms car and drove to the store'],
'begin_end':[[128, 139],[20,31]]}
df = pd.DataFrame(data)
我想使用begin_end
列中的數組將text
列中的單詞提取到新列中,例如text[128:139+1]
。 所以它將是:
begin_end new_col
0 [128, 139] have visited
1 [20, 31] when he came
您需要使用循環:
df['new_col'] = [s[a:b+1] for s, (a,b) in zip(df['text'], df['begin_end'])]
輸出:
text begin_end new_col
0 They say that all cats land on their feet, but... [128, 139] have visited
1 Mom called dad, and when he came home, he took... [20, 31] when he came
您可以以非常簡單的方式嘗試此操作
import pandas as pd
data = pd.DataFrame({'text': ['They say that all cats land on their feet, but this does not apply to my cat. He not only often falls, but also jumps badly. We have visited the veterinarian more than once with dislocated paws and damaged internal organs.',
'Mom called dad, and when he came home, he took moms car and drove to the store'],
'begin_end':[[128, 139],[20,31]]})
data
output :
text begin_end
0 They say that all cats land on their feet, but... [128, 139]
1 Mom called dad, and when he came home, he took... [20, 31]
應用功能
def getString(string,location):
if location[0] < location[1]: ##checking condtion #you can apply more conditions
return string[location[0]:location[1]+1]
data['new_col']= data.apply(lambda x : getString(x['text'],x['begin_end']),axis=1)
data
輸出:
text begin_end new_col
0 They say that all cats land on their feet, but... [128, 139] have visited
1 Mom called dad, and when he came home, he took... [20, 31] when he came
嘗試這個:
df['begin'] = df['begin_end'].apply(lambda x: x[0])
df['end'] = df['begin_end'].apply(lambda x: x[1])
df['new_col'] = df.apply(lambda x: x['text'][x['begin']:x['end']+1], axis=1)
輸出:
text begin_end begin end new_col
0 ... [128, 139] 128 139 have visited
1 ... [20, 31] 20 31 when he came
如果begin
和end
已經分開存儲,則不需要從begin_end
中提取它們。 如果沒有必要,盡量避免將list
存儲在pd.Series()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.