簡體   English   中英

通過索引從文本中提取單詞到新列 Pandas Python

[英]Extract words from the text by index into a new column Pandas Python

我有數據框:

data = {'text': ['They say that all cats land on their feet, but this does not apply to my cat. He not only often falls, but also jumps badly. We have visited the veterinarian more than once with dislocated paws and damaged internal organs.',
                'Mom called dad, and when he came home, he took moms car and drove to the store'],
       'begin_end':[[128, 139],[20,31]]}
        
df = pd.DataFrame(data)

我想使用begin_end列中的數組將text列中的單詞提取到新列中,例如text[128:139+1] 所以它將是:

    begin_end     new_col
0   [128, 139]  have visited
1   [20, 31]    when he came

您需要使用循環:

df['new_col'] = [s[a:b+1] for s, (a,b) in zip(df['text'], df['begin_end'])]

輸出:

                                                text   begin_end       new_col
0  They say that all cats land on their feet, but...  [128, 139]  have visited
1  Mom called dad, and when he came home, he took...    [20, 31]  when he came

您可以以非常簡單的方式嘗試此操作

    import pandas as pd
    data = pd.DataFrame({'text': ['They say that all cats land on their feet, but this does not apply to my cat. He not only often falls, but also jumps badly. We have visited the veterinarian more than once with dislocated paws and damaged internal organs.',
                    'Mom called dad, and when he came home, he took moms car and drove to the store'],
           'begin_end':[[128, 139],[20,31]]})
    data
output :

    text                                                  begin_end
    0   They say that all cats land on their feet, but...   [128, 139]
    1   Mom called dad, and when he came home, he took...   [20, 31]

應用功能

def getString(string,location):
    if location[0] < location[1]: ##checking condtion #you can apply more conditions
        return string[location[0]:location[1]+1]
data['new_col']= data.apply(lambda x : getString(x['text'],x['begin_end']),axis=1)
data

輸出:

    text                                             begin_end  new_col
0   They say that all cats land on their feet, but...   [128, 139]  have visited
1   Mom called dad, and when he came home, he took...   [20, 31]    when he came

嘗試這個:

df['begin'] = df['begin_end'].apply(lambda x: x[0])
df['end'] = df['begin_end'].apply(lambda x: x[1])
df['new_col'] = df.apply(lambda x: x['text'][x['begin']:x['end']+1], axis=1)

輸出:

 text    begin_end  begin  end       new_col
0 ...   [128, 139]    128  139  have visited
1 ...     [20, 31]     20   31  when he came

如果beginend已經分開存儲,則不需要從begin_end中提取它們。 如果沒有必要,盡量避免將list存儲在pd.Series()

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM