通过索引从文本中提取单词到新列 Pandas Python

Question

I have dataframe:我有数据框：

data = {'text': ['They say that all cats land on their feet, but this does not apply to my cat. He not only often falls, but also jumps badly. We have visited the veterinarian more than once with dislocated paws and damaged internal organs.',
                'Mom called dad, and when he came home, he took moms car and drove to the store'],
       'begin_end':[[128, 139],[20,31]]}
        
df = pd.DataFrame(data)

I want to use an array from the begin_end column to extract the words from the text column into a new column, like text[128:139+1] .我想使用begin_end列中的数组将text列中的单词提取到新列中，例如text[128:139+1] 。 So it will be:所以它将是：

    begin_end     new_col
0   [128, 139]  have visited
1   [20, 31]    when he came

Answer 1

You need to use a loop:您需要使用循环：

df['new_col'] = [s[a:b+1] for s, (a,b) in zip(df['text'], df['begin_end'])]

output:输出：

                                                text   begin_end       new_col
0  They say that all cats land on their feet, but...  [128, 139]  have visited
1  Mom called dad, and when he came home, he took...    [20, 31]  when he came

Answer 2

You can try this in very easy and simple way您可以以非常简单的方式尝试此操作

    import pandas as pd
    data = pd.DataFrame({'text': ['They say that all cats land on their feet, but this does not apply to my cat. He not only often falls, but also jumps badly. We have visited the veterinarian more than once with dislocated paws and damaged internal organs.',
                    'Mom called dad, and when he came home, he took moms car and drove to the store'],
           'begin_end':[[128, 139],[20,31]]})
    data
output :

    text                                                  begin_end
    0   They say that all cats land on their feet, but...   [128, 139]
    1   Mom called dad, and when he came home, he took...   [20, 31]

Apply function应用功能

def getString(string,location):
    if location[0] < location[1]: ##checking condtion #you can apply more conditions
        return string[location[0]:location[1]+1]
data['new_col']= data.apply(lambda x : getString(x['text'],x['begin_end']),axis=1)
data

output:输出：

    text                                             begin_end  new_col
0   They say that all cats land on their feet, but...   [128, 139]  have visited
1   Mom called dad, and when he came home, he took...   [20, 31]    when he came

Answer 3

Try this:尝试这个：

df['begin'] = df['begin_end'].apply(lambda x: x[0])
df['end'] = df['begin_end'].apply(lambda x: x[1])
df['new_col'] = df.apply(lambda x: x['text'][x['begin']:x['end']+1], axis=1)

Output:输出：

 text    begin_end  begin  end       new_col
0 ...   [128, 139]    128  139  have visited
1 ...     [20, 31]     20   31  when he came

If begin and end are already stored separately, you do not need to extract them from the begin_end .如果begin和end已经分开存储，则不需要从begin_end中提取它们。 If it is not necessary, try to avoid storing a list in a pd.Series()如果没有必要，尽量避免将list存储在pd.Series()

通过索引从文本中提取单词到新列 Pandas Python

问题描述

3 个解决方案

解决方案1
2 已采纳 2022-07-07 11:53:59

解决方案2
1 2022-07-07 12:09:45

解决方案3
0 2022-07-07 12:04:19

通过索引从文本中提取单词到新列 Pandas Python

问题描述

3 个解决方案

解决方案1 2 已采纳 2022-07-07 11:53:59

解决方案2 1 2022-07-07 12:09:45

解决方案3 0 2022-07-07 12:04:19

解决方案1
2 已采纳 2022-07-07 11:53:59

解决方案2
1 2022-07-07 12:09:45

解决方案3
0 2022-07-07 12:04:19