[英]Extract word with NN tag from tuple in a list
我试图在每个具有“ NN
”标签的元组中提取第 0 个元素。 只想根据标签提取单词。 例如。 每行:
train['Tag'] = [('unclear', 'JJ'), ('incomplete', 'JJ'), ('instruction', 'NN'), ('given', 'VBN')]
我尝试使用 where 子句提取每个元组中的第一个元素
train['Tagged2']= [x[0] for x in train['Tag'] if x[1] in ("NN")]
预期结果,新列包含每行带有NN
标签的单词,在示例中将是单词“ instruction
”
==
:
如果两个操作数的值相等,则条件成立。
in
:
如果找到指定序列中的变量,则计算结果为 true,否则为 false。
因此:
使用比较运算符==
而不是in
:
tt = [('unclear', 'JJ'), ('incomplete', 'JJ'), ('instruction', 'NN'), ('given', 'VBN')]
print([t[0] for t in tt if t[1] == 'NN'])
输出:
['instruction']
编辑:
由于您更新了您的问题:
train = {} # Assuming that you're working with associative arrays i.e. dict in Py
train['Tag'] = [('unclear', 'JJ'), ('incomplete', 'JJ'), ('instruction', 'NN'), ('given', 'VBN')]
print([t[0] for t in train['Tag'] if t[1] == 'NN'])
输出:
['instruction']
由于您必须根据条件创建新的pandas
列,因此您可以使用以下代码过滤掉带有标签NN
单词
df = pd.DataFrame()
df['Tag'] = [('unclear', 'JJ'), ('incomplete', 'JJ'), ('instruction', 'NN'), ('given', 'VBN')]
# create 2 separate columns with tags and words
df['words'] = [i[0] for i in df['Tag']]
df['tags'] = [i[1] for i in df['Tag']]
# use np.where to find tags with `NN`
df['Tagged2'] = np.where(df['tags']=='NN', df['words'], np.nan)
df.drop(['words','tags'],1,inplace=True)
print(df)
输出:
Tag Tagged2
0 (unclear, JJ) NaN
1 (incomplete, JJ) NaN
2 (instruction, NN) instruction
3 (given, VBN) NaN
train['Tagged3']= train['subclause'].apply(lambda x:' '.join([word for (word, pos) in nltk.pos_tag(nltk.word_tokenize(x)) if pos[0] == 'N']))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.