简体   繁体   中英

Tokenize words and getting elements right before and after this word

My dataframe had a column of strings (col A). I tokenized it and now I have:

Input:

Col A
'A', B', 'C', 'dog', 'C', 'C', 'C', 'C'
'A', B', 'B', 'dog', 'D', 'A', 'C', 'C', 'D'

I want to get 2 itens right before and after the word 'dog' in a column B. Therefore, I want something like this:

Output:

Col B
'B', 'C', 'dog', 'C', 'C'
'B', 'B', 'dog', 'D', 'A'

How do I get that?

If there must exist one and only one dog in your column.

import pandas as pd


df = pd.DataFrame({'Col A': ["'A', 'B', 'C', 'dog', 'C', 'C', 'C', 'C'", "'A', 'B', 'B', 'dog', 'D', 'A', 'C', 'C', 'D'"]})

def extract(l):
    l = [e.strip() for e in l]
    idx = l.index("'dog'")
    return l[(idx-2 if idx-2 >= 0 else 0):idx+3]

df['Col B'] = df['Col A'].str.split(',').apply(extract)
print(df)

                                           Col A                        Col B
0       'A', 'B', 'C', 'dog', 'C', 'C', 'C', 'C'  ['B', 'C', 'dog', 'C', 'C']
1  'A', 'B', 'B', 'dog', 'D', 'A', 'C', 'C', 'D'  ['B', 'B', 'dog', 'D', 'A']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM