Tokenize words and getting elements right before and after this word

Question

My dataframe had a column of strings (col A). I tokenized it and now I have:

Input:

Col A
'A', B', 'C', 'dog', 'C', 'C', 'C', 'C'
'A', B', 'B', 'dog', 'D', 'A', 'C', 'C', 'D'

I want to get 2 itens right before and after the word 'dog' in a column B. Therefore, I want something like this:

Output:

Col B
'B', 'C', 'dog', 'C', 'C'
'B', 'B', 'dog', 'D', 'A'

How do I get that?

Answer 1

If there must exist one and only one dog in your column.

import pandas as pd


df = pd.DataFrame({'Col A': ["'A', 'B', 'C', 'dog', 'C', 'C', 'C', 'C'", "'A', 'B', 'B', 'dog', 'D', 'A', 'C', 'C', 'D'"]})

def extract(l):
    l = [e.strip() for e in l]
    idx = l.index("'dog'")
    return l[(idx-2 if idx-2 >= 0 else 0):idx+3]

df['Col B'] = df['Col A'].str.split(',').apply(extract)

print(df)

                                           Col A                        Col B
0       'A', 'B', 'C', 'dog', 'C', 'C', 'C', 'C'  ['B', 'C', 'dog', 'C', 'C']
1  'A', 'B', 'B', 'dog', 'D', 'A', 'C', 'C', 'D'  ['B', 'B', 'dog', 'D', 'A']

Tokenize words and getting elements right before and after this word

Question

1 answers

solution1
0 2021-05-11 04:33:51

Tokenize words and getting elements right before and after this word

Question

1 answers

solution1 0 2021-05-11 04:33:51

solution1
0 2021-05-11 04:33:51