[英]Extract words from the text by index into a new column Pandas Python
我有数据框:
data = {'text': ['They say that all cats land on their feet, but this does not apply to my cat. He not only often falls, but also jumps badly. We have visited the veterinarian more than once with dislocated paws and damaged internal organs.',
'Mom called dad, and when he came home, he took moms car and drove to the store'],
'begin_end':[[128, 139],[20,31]]}
df = pd.DataFrame(data)
我想使用begin_end
列中的数组将text
列中的单词提取到新列中,例如text[128:139+1]
。 所以它将是:
begin_end new_col
0 [128, 139] have visited
1 [20, 31] when he came
您需要使用循环:
df['new_col'] = [s[a:b+1] for s, (a,b) in zip(df['text'], df['begin_end'])]
输出:
text begin_end new_col
0 They say that all cats land on their feet, but... [128, 139] have visited
1 Mom called dad, and when he came home, he took... [20, 31] when he came
您可以以非常简单的方式尝试此操作
import pandas as pd
data = pd.DataFrame({'text': ['They say that all cats land on their feet, but this does not apply to my cat. He not only often falls, but also jumps badly. We have visited the veterinarian more than once with dislocated paws and damaged internal organs.',
'Mom called dad, and when he came home, he took moms car and drove to the store'],
'begin_end':[[128, 139],[20,31]]})
data
output :
text begin_end
0 They say that all cats land on their feet, but... [128, 139]
1 Mom called dad, and when he came home, he took... [20, 31]
应用功能
def getString(string,location):
if location[0] < location[1]: ##checking condtion #you can apply more conditions
return string[location[0]:location[1]+1]
data['new_col']= data.apply(lambda x : getString(x['text'],x['begin_end']),axis=1)
data
输出:
text begin_end new_col
0 They say that all cats land on their feet, but... [128, 139] have visited
1 Mom called dad, and when he came home, he took... [20, 31] when he came
尝试这个:
df['begin'] = df['begin_end'].apply(lambda x: x[0])
df['end'] = df['begin_end'].apply(lambda x: x[1])
df['new_col'] = df.apply(lambda x: x['text'][x['begin']:x['end']+1], axis=1)
输出:
text begin_end begin end new_col
0 ... [128, 139] 128 139 have visited
1 ... [20, 31] 20 31 when he came
如果begin
和end
已经分开存储,则不需要从begin_end
中提取它们。 如果没有必要,尽量避免将list
存储在pd.Series()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.