pandas：根据另一列中的值获取具有相应索引的精确对应值

Question

我有一列字符串（句子）和一列逗号分隔的字符串列表，如下所示：

df = pd.DataFrame({ 'text':['the weather is nice though', 'How are you today','the beautiful girl and the nice boy'],
'pos':[['DET', 'NOUN', 'VERB','ADJ', 'ADV'],['QUA', 'VERB', 'PRON', 'ADV'], ['DET', 'ADJ', 'NOUN','CON','DET', 'ADJ', 'NOUN' ]]})

我想以某种方式比较这些列，并创建第三列，如果 'pos' 列包含值 'ADJ'，我会在 'text' 列中找到它对应的值（在这种情况下，在第一行我have 'nice') 并以字典的形式返回其索引。 所以这就是第三列的样子；

third_column:

1 {'nice' : 3}
2 {}
3 {'beautiful':1, 'nice':6}

到目前为止，我已经尝试了以下方法：

df['Third_column']= ' '
df['liststring'] = [' '.join(map(str, l)) for l in df['pos']]
df.loc[df['liststring'].str.contains('ADJ'),'text']

但不知道如何继续获得确切的单词和索引

Answer 1

您所描述的正是pandas.DataFrame.apply所做的。

如果你想根据pandas中的其他列/行计算另一列/行，应该考虑这种方法。

import pandas as pd


def extract_words(row):
    word_pos = {}
    text_splited = row.text.split()
    for i, p in enumerate(row.pos):
        if p == 'ADJ':
            word_pos[text_splited[i]] = i
    return word_pos


df = ...
df['Third_column'] = df.apply(extract_words, axis=1)

Answer 2

我会做一些事情：

将单词和 POS 标签放入单独的（同步）列中：
```
 df['text'] = df.text.str.split() df = df.apply(pd.Series.explode)
```
```
 text pos 0 the DET 0 weather NOUN 0 is VERB 0 nice ADJ 0 though ADV
```
（注意：将列表、字典和其他序列作为单元格通常表明您需要重组数据。）

重置索引，将原始索引保留为 'sent_id' 并将句子索引添加到标记中：

 df['sent_id'] = df.index df = df.reset_index(drop=True) df['tok_id'] = df.groupby('sent_id').cumcount()

 text pos sent_id tok_id 0 the DET 0 0 1 weather NOUN 0 1 2 is VERB 0 2 3 nice ADJ 0 3 4 though ADV 0 4 5 How QUA 1 0 6 are VERB 1 1 7 you PRON 1 2

最后，获取所有'ADJ' -rows

 df[df.pos.eq('ADJ')]

 text pos sent_id tok_id 3 nice ADJ 0 3 10 beautiful ADJ 2 1 14 nice ADJ 2 5

pandas：根据另一列中的值获取具有相应索引的精确对应值

问题描述

2 个解决方案

解决方案1
1 已采纳 2021-07-03 11:32:05

解决方案2
0 2021-07-03 11:47:05

pandas：根据另一列中的值获取具有相应索引的精确对应值

问题描述

2 个解决方案

解决方案1 1 已采纳 2021-07-03 11:32:05

解决方案2 0 2021-07-03 11:47:05

解决方案1
1 已采纳 2021-07-03 11:32:05

解决方案2
0 2021-07-03 11:47:05