[英]Extract words from the text by index into a new column Pandas Python
I have dataframe:我有数据框:
data = {'text': ['They say that all cats land on their feet, but this does not apply to my cat. He not only often falls, but also jumps badly. We have visited the veterinarian more than once with dislocated paws and damaged internal organs.',
'Mom called dad, and when he came home, he took moms car and drove to the store'],
'begin_end':[[128, 139],[20,31]]}
df = pd.DataFrame(data)
I want to use an array from the begin_end
column to extract the words from the text
column into a new column, like text[128:139+1]
.我想使用begin_end
列中的数组将text
列中的单词提取到新列中,例如text[128:139+1]
。 So it will be:所以它将是:
begin_end new_col
0 [128, 139] have visited
1 [20, 31] when he came
You need to use a loop:您需要使用循环:
df['new_col'] = [s[a:b+1] for s, (a,b) in zip(df['text'], df['begin_end'])]
output:输出:
text begin_end new_col
0 They say that all cats land on their feet, but... [128, 139] have visited
1 Mom called dad, and when he came home, he took... [20, 31] when he came
You can try this in very easy and simple way您可以以非常简单的方式尝试此操作
import pandas as pd
data = pd.DataFrame({'text': ['They say that all cats land on their feet, but this does not apply to my cat. He not only often falls, but also jumps badly. We have visited the veterinarian more than once with dislocated paws and damaged internal organs.',
'Mom called dad, and when he came home, he took moms car and drove to the store'],
'begin_end':[[128, 139],[20,31]]})
data
output :
text begin_end
0 They say that all cats land on their feet, but... [128, 139]
1 Mom called dad, and when he came home, he took... [20, 31]
Apply function应用功能
def getString(string,location):
if location[0] < location[1]: ##checking condtion #you can apply more conditions
return string[location[0]:location[1]+1]
data['new_col']= data.apply(lambda x : getString(x['text'],x['begin_end']),axis=1)
data
output:输出:
text begin_end new_col
0 They say that all cats land on their feet, but... [128, 139] have visited
1 Mom called dad, and when he came home, he took... [20, 31] when he came
Try this:尝试这个:
df['begin'] = df['begin_end'].apply(lambda x: x[0])
df['end'] = df['begin_end'].apply(lambda x: x[1])
df['new_col'] = df.apply(lambda x: x['text'][x['begin']:x['end']+1], axis=1)
Output:输出:
text begin_end begin end new_col
0 ... [128, 139] 128 139 have visited
1 ... [20, 31] 20 31 when he came
If begin
and end
are already stored separately, you do not need to extract them from the begin_end
.如果begin
和end
已经分开存储,则不需要从begin_end
中提取它们。 If it is not necessary, try to avoid storing a list
in a pd.Series()
如果没有必要,尽量避免将list
存储在pd.Series()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.