pandas: get the exact corresponding value with the corresponding index based on a value in another column

Question

I have a column of string(sentence) and a column of comma separated list of strings as follows:

df = pd.DataFrame({ 'text':['the weather is nice though', 'How are you today','the beautiful girl and the nice boy'],
'pos':[['DET', 'NOUN', 'VERB','ADJ', 'ADV'],['QUA', 'VERB', 'PRON', 'ADV'], ['DET', 'ADJ', 'NOUN','CON','DET', 'ADJ', 'NOUN' ]]})

and I would like to somehow compare the columns, and create a third column where if the 'pos' column contains the value 'ADJ', I would find its corresponding value in the 'text' column (in this case in the first row I have 'nice') and return its index as well in a form of a dictionary . so this is how the third column should look like;

third_column:

1 {'nice' : 3}
2 {}
3 {'beautiful':1, 'nice':6}

so far I have tried the following:

df['Third_column']= ' '
df['liststring'] = [' '.join(map(str, l)) for l in df['pos']]
df.loc[df['liststring'].str.contains('ADJ'),'text']

but do not know how to proceed to get the exact word and the index

Answer 1

What you describe is exactly what pandas.DataFrame.apply does.

If you want to calculate another column/row according to other columns/rows in pandas, this method should be considered.

import pandas as pd


def extract_words(row):
    word_pos = {}
    text_splited = row.text.split()
    for i, p in enumerate(row.pos):
        if p == 'ADJ':
            word_pos[text_splited[i]] = i
    return word_pos


df = ...
df['Third_column'] = df.apply(extract_words, axis=1)

Answer 2

I would do something along the lines of:

Getting the words and POS tags into individual (synced) columns:
```
 df['text'] = df.text.str.split() df = df.apply(pd.Series.explode)
```
```
 text pos 0 the DET 0 weather NOUN 0 is VERB 0 nice ADJ 0 though ADV
```
(Note: Having lists, dictionaries, and other sequences as cells is mostly a sign that you need to restructure your data.)

Resetting the index, keeping the original index as 'sent_id' and adding sentence-wise indices to the tokens:

 df['sent_id'] = df.index df = df.reset_index(drop=True) df['tok_id'] = df.groupby('sent_id').cumcount()

 text pos sent_id tok_id 0 the DET 0 0 1 weather NOUN 0 1 2 is VERB 0 2 3 nice ADJ 0 3 4 though ADV 0 4 5 How QUA 1 0 6 are VERB 1 1 7 you PRON 1 2

Finally, getting all the 'ADJ' -rows

 df[df.pos.eq('ADJ')]

 text pos sent_id tok_id 3 nice ADJ 0 3 10 beautiful ADJ 2 1 14 nice ADJ 2 5

pandas: get the exact corresponding value with the corresponding index based on a value in another column

Question

2 answers

solution1
1 ACCPTED 2021-07-03 11:32:05

solution2
0 2021-07-03 11:47:05

pandas: get the exact corresponding value with the corresponding index based on a value in another column

Question

2 answers

solution1 1 ACCPTED 2021-07-03 11:32:05

solution2 0 2021-07-03 11:47:05

solution1
1 ACCPTED 2021-07-03 11:32:05

solution2
0 2021-07-03 11:47:05