简体   繁体   English

通过索引从文本中提取单词到新列 Pandas Python

[英]Extract words from the text by index into a new column Pandas Python

I have dataframe:我有数据框:

data = {'text': ['They say that all cats land on their feet, but this does not apply to my cat. He not only often falls, but also jumps badly. We have visited the veterinarian more than once with dislocated paws and damaged internal organs.',
                'Mom called dad, and when he came home, he took moms car and drove to the store'],
       'begin_end':[[128, 139],[20,31]]}
        
df = pd.DataFrame(data)

I want to use an array from the begin_end column to extract the words from the text column into a new column, like text[128:139+1] .我想使用begin_end列中的数组将text列中的单词提取到新列中,例如text[128:139+1] So it will be:所以它将是:

    begin_end     new_col
0   [128, 139]  have visited
1   [20, 31]    when he came

You need to use a loop:您需要使用循环:

df['new_col'] = [s[a:b+1] for s, (a,b) in zip(df['text'], df['begin_end'])]

output:输出:

                                                text   begin_end       new_col
0  They say that all cats land on their feet, but...  [128, 139]  have visited
1  Mom called dad, and when he came home, he took...    [20, 31]  when he came

You can try this in very easy and simple way您可以以非常简单的方式尝试此操作

    import pandas as pd
    data = pd.DataFrame({'text': ['They say that all cats land on their feet, but this does not apply to my cat. He not only often falls, but also jumps badly. We have visited the veterinarian more than once with dislocated paws and damaged internal organs.',
                    'Mom called dad, and when he came home, he took moms car and drove to the store'],
           'begin_end':[[128, 139],[20,31]]})
    data
output :

    text                                                  begin_end
    0   They say that all cats land on their feet, but...   [128, 139]
    1   Mom called dad, and when he came home, he took...   [20, 31]

Apply function应用功能

def getString(string,location):
    if location[0] < location[1]: ##checking condtion #you can apply more conditions
        return string[location[0]:location[1]+1]
data['new_col']= data.apply(lambda x : getString(x['text'],x['begin_end']),axis=1)
data

output:输出:

    text                                             begin_end  new_col
0   They say that all cats land on their feet, but...   [128, 139]  have visited
1   Mom called dad, and when he came home, he took...   [20, 31]    when he came

Try this:尝试这个:

df['begin'] = df['begin_end'].apply(lambda x: x[0])
df['end'] = df['begin_end'].apply(lambda x: x[1])
df['new_col'] = df.apply(lambda x: x['text'][x['begin']:x['end']+1], axis=1)

Output:输出:

 text    begin_end  begin  end       new_col
0 ...   [128, 139]    128  139  have visited
1 ...     [20, 31]     20   31  when he came

If begin and end are already stored separately, you do not need to extract them from the begin_end .如果beginend已经分开存储,则不需要从begin_end中提取它们。 If it is not necessary, try to avoid storing a list in a pd.Series()如果没有必要,尽量避免将list存储在pd.Series()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM