繁体   English   中英

通过索引从文本中提取单词到新列 Pandas Python

[英]Extract words from the text by index into a new column Pandas Python

我有数据框:

data = {'text': ['They say that all cats land on their feet, but this does not apply to my cat. He not only often falls, but also jumps badly. We have visited the veterinarian more than once with dislocated paws and damaged internal organs.',
                'Mom called dad, and when he came home, he took moms car and drove to the store'],
       'begin_end':[[128, 139],[20,31]]}
        
df = pd.DataFrame(data)

我想使用begin_end列中的数组将text列中的单词提取到新列中,例如text[128:139+1] 所以它将是:

    begin_end     new_col
0   [128, 139]  have visited
1   [20, 31]    when he came

您需要使用循环:

df['new_col'] = [s[a:b+1] for s, (a,b) in zip(df['text'], df['begin_end'])]

输出:

                                                text   begin_end       new_col
0  They say that all cats land on their feet, but...  [128, 139]  have visited
1  Mom called dad, and when he came home, he took...    [20, 31]  when he came

您可以以非常简单的方式尝试此操作

    import pandas as pd
    data = pd.DataFrame({'text': ['They say that all cats land on their feet, but this does not apply to my cat. He not only often falls, but also jumps badly. We have visited the veterinarian more than once with dislocated paws and damaged internal organs.',
                    'Mom called dad, and when he came home, he took moms car and drove to the store'],
           'begin_end':[[128, 139],[20,31]]})
    data
output :

    text                                                  begin_end
    0   They say that all cats land on their feet, but...   [128, 139]
    1   Mom called dad, and when he came home, he took...   [20, 31]

应用功能

def getString(string,location):
    if location[0] < location[1]: ##checking condtion #you can apply more conditions
        return string[location[0]:location[1]+1]
data['new_col']= data.apply(lambda x : getString(x['text'],x['begin_end']),axis=1)
data

输出:

    text                                             begin_end  new_col
0   They say that all cats land on their feet, but...   [128, 139]  have visited
1   Mom called dad, and when he came home, he took...   [20, 31]    when he came

尝试这个:

df['begin'] = df['begin_end'].apply(lambda x: x[0])
df['end'] = df['begin_end'].apply(lambda x: x[1])
df['new_col'] = df.apply(lambda x: x['text'][x['begin']:x['end']+1], axis=1)

输出:

 text    begin_end  begin  end       new_col
0 ...   [128, 139]    128  139  have visited
1 ...     [20, 31]     20   31  when he came

如果beginend已经分开存储,则不需要从begin_end中提取它们。 如果没有必要,尽量避免将list存储在pd.Series()

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM