简体   繁体   中英

Extract words from the text by index into a new column Pandas Python

I have dataframe:

data = {'text': ['They say that all cats land on their feet, but this does not apply to my cat. He not only often falls, but also jumps badly. We have visited the veterinarian more than once with dislocated paws and damaged internal organs.',
                'Mom called dad, and when he came home, he took moms car and drove to the store'],
       'begin_end':[[128, 139],[20,31]]}
        
df = pd.DataFrame(data)

I want to use an array from the begin_end column to extract the words from the text column into a new column, like text[128:139+1] . So it will be:

    begin_end     new_col
0   [128, 139]  have visited
1   [20, 31]    when he came

You need to use a loop:

df['new_col'] = [s[a:b+1] for s, (a,b) in zip(df['text'], df['begin_end'])]

output:

                                                text   begin_end       new_col
0  They say that all cats land on their feet, but...  [128, 139]  have visited
1  Mom called dad, and when he came home, he took...    [20, 31]  when he came

You can try this in very easy and simple way

    import pandas as pd
    data = pd.DataFrame({'text': ['They say that all cats land on their feet, but this does not apply to my cat. He not only often falls, but also jumps badly. We have visited the veterinarian more than once with dislocated paws and damaged internal organs.',
                    'Mom called dad, and when he came home, he took moms car and drove to the store'],
           'begin_end':[[128, 139],[20,31]]})
    data
output :

    text                                                  begin_end
    0   They say that all cats land on their feet, but...   [128, 139]
    1   Mom called dad, and when he came home, he took...   [20, 31]

Apply function

def getString(string,location):
    if location[0] < location[1]: ##checking condtion #you can apply more conditions
        return string[location[0]:location[1]+1]
data['new_col']= data.apply(lambda x : getString(x['text'],x['begin_end']),axis=1)
data

output:

    text                                             begin_end  new_col
0   They say that all cats land on their feet, but...   [128, 139]  have visited
1   Mom called dad, and when he came home, he took...   [20, 31]    when he came

Try this:

df['begin'] = df['begin_end'].apply(lambda x: x[0])
df['end'] = df['begin_end'].apply(lambda x: x[1])
df['new_col'] = df.apply(lambda x: x['text'][x['begin']:x['end']+1], axis=1)

Output:

 text    begin_end  begin  end       new_col
0 ...   [128, 139]    128  139  have visited
1 ...     [20, 31]     20   31  when he came

If begin and end are already stored separately, you do not need to extract them from the begin_end . If it is not necessary, try to avoid storing a list in a pd.Series()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM