[英]pandas: get the exact corresponding value with the corresponding index based on a value in another column
我有一列字符串(句子)和一列逗號分隔的字符串列表,如下所示:
df = pd.DataFrame({ 'text':['the weather is nice though', 'How are you today','the beautiful girl and the nice boy'],
'pos':[['DET', 'NOUN', 'VERB','ADJ', 'ADV'],['QUA', 'VERB', 'PRON', 'ADV'], ['DET', 'ADJ', 'NOUN','CON','DET', 'ADJ', 'NOUN' ]]})
我想以某種方式比較這些列,並創建第三列,如果 'pos' 列包含值 'ADJ',我會在 'text' 列中找到它對應的值(在這種情況下,在第一行我have 'nice') 並以字典的形式返回其索引。 所以這就是第三列的樣子;
third_column:
1 {'nice' : 3}
2 {}
3 {'beautiful':1, 'nice':6}
到目前為止,我已經嘗試了以下方法:
df['Third_column']= ' '
df['liststring'] = [' '.join(map(str, l)) for l in df['pos']]
df.loc[df['liststring'].str.contains('ADJ'),'text']
但不知道如何繼續獲得確切的單詞和索引
您所描述的正是pandas.DataFrame.apply
所做的。
如果你想根據pandas中的其他列/行計算另一列/行,應該考慮這種方法。
import pandas as pd
def extract_words(row):
word_pos = {}
text_splited = row.text.split()
for i, p in enumerate(row.pos):
if p == 'ADJ':
word_pos[text_splited[i]] = i
return word_pos
df = ...
df['Third_column'] = df.apply(extract_words, axis=1)
我會做一些事情:
將單詞和 POS 標簽放入單獨的(同步)列中:
df['text'] = df.text.str.split() df = df.apply(pd.Series.explode)
text pos 0 the DET 0 weather NOUN 0 is VERB 0 nice ADJ 0 though ADV
(注意:將列表、字典和其他序列作為單元格通常表明您需要重組數據。)
重置索引,將原始索引保留為 'sent_id' 並將句子索引添加到標記中:
df['sent_id'] = df.index df = df.reset_index(drop=True) df['tok_id'] = df.groupby('sent_id').cumcount()
text pos sent_id tok_id 0 the DET 0 0 1 weather NOUN 0 1 2 is VERB 0 2 3 nice ADJ 0 3 4 though ADV 0 4 5 How QUA 1 0 6 are VERB 1 1 7 you PRON 1 2
最后,獲取所有'ADJ'
-rows
df[df.pos.eq('ADJ')]
text pos sent_id tok_id 3 nice ADJ 0 3 10 beautiful ADJ 2 1 14 nice ADJ 2 5
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.