ConditionalFreqDist 查找单词最常见的 POS 标签

Question

I am trying to fidn the most frequent POS tag for words in the dataset but struggling with the ConditionalFrewDist part.我正在尝试为数据集中的单词找到最常见的 POS 标签，但在 ConditionalFrewDist 部分苦苦挣扎。

import nltk
tw = nltk.corpus.brown.tagged_words()

train_idx = int(0.8*len(tw))
training_set = tw[:train_idx]
test_set = tw[train_idx:]

words= list(zip(*training_set))[0]

from nltk import ConditionalFreqDist
ofd= ConditionalFreqDist(word for word in list(zip(*training_set))[0])

tags= list(zip(*training_set))[1]
ofd.tabulate(conditions= words, samples= tags)

ValueError: too many values to unpack (expected 2) ValueError：要解包的值太多（预期为 2）

Answer 1

As you might read in documents the ConditionalFreqDist helps you to calculate正如您可能在文档中看到的那样， ConditionalFreqDist可以帮助您计算

A collection of frequency distributions for a single experiment run under different conditions.在不同条件下运行的单个实验的频率分布集合。

The only thing you must provide, is the list of items and conditions which can be translated (in this problem) to words and corresponding POS tags.您唯一必须提供的是可以（在此问题中）翻译为单词和相应 POS 标签的项目和条件列表。 The code with minimal changes would look like this and would calculate distributions for the whole corpus but tabulate the results for the first 10th items and conditions(preventing a crash):更改最少的代码看起来像这样，它将计算整个语料库的分布，但将前 10 个项目和条件的结果制成表格（防止崩溃）：

import nltk
from nltk import ConditionalFreqDist

tw = nltk.corpus.brown.tagged_words()
train_idx = int(0.8*len(tw))
training_set = tw[:train_idx]
test_set = tw[train_idx:]
words= list(zip(*training_set))[0] # items
tags= list(zip(*training_set))[1] # conditions

ofd= ConditionalFreqDist((tag, word) for tag, word in zip(words, tags)) # simple comprehension pattern in python
ofd.tabulate(conditions= words[:10], samples= tags[:10])

ConditionalFreqDist 查找单词最常见的 POS 标签

问题描述

1 个解决方案

解决方案1
0 2022-02-05 21:20:11

ConditionalFreqDist 查找单词最常见的 POS 标签

问题描述

1 个解决方案

解决方案1 0 2022-02-05 21:20:11

解决方案1
0 2022-02-05 21:20:11