ConditionalFreqDist to find most frequent POS tags for words

Question

I am trying to fidn the most frequent POS tag for words in the dataset but struggling with the ConditionalFrewDist part.

import nltk
tw = nltk.corpus.brown.tagged_words()

train_idx = int(0.8*len(tw))
training_set = tw[:train_idx]
test_set = tw[train_idx:]

words= list(zip(*training_set))[0]

from nltk import ConditionalFreqDist
ofd= ConditionalFreqDist(word for word in list(zip(*training_set))[0])

tags= list(zip(*training_set))[1]
ofd.tabulate(conditions= words, samples= tags)

ValueError: too many values to unpack (expected 2)

Answer 1

As you might read in documents the ConditionalFreqDist helps you to calculate

A collection of frequency distributions for a single experiment run under different conditions.

The only thing you must provide, is the list of items and conditions which can be translated (in this problem) to words and corresponding POS tags. The code with minimal changes would look like this and would calculate distributions for the whole corpus but tabulate the results for the first 10th items and conditions(preventing a crash):

import nltk
from nltk import ConditionalFreqDist

tw = nltk.corpus.brown.tagged_words()
train_idx = int(0.8*len(tw))
training_set = tw[:train_idx]
test_set = tw[train_idx:]
words= list(zip(*training_set))[0] # items
tags= list(zip(*training_set))[1] # conditions

ofd= ConditionalFreqDist((tag, word) for tag, word in zip(words, tags)) # simple comprehension pattern in python
ofd.tabulate(conditions= words[:10], samples= tags[:10])

ConditionalFreqDist to find most frequent POS tags for words

Question

1 answers

solution1
0 2022-02-05 21:20:11

ConditionalFreqDist to find most frequent POS tags for words

Question

1 answers

solution1 0 2022-02-05 21:20:11

solution1
0 2022-02-05 21:20:11