简体   繁体   中英

Python Pyplot word occurrence frequency

I have to plot the occurrence of each frequency of word in a txt file. So far I have the dictionary that contains each word and the frequency that it appears in the txt file. In order to plot, I have to convert that dictionary into a new dictionary (I'm assuming) that counts the number words at each frequency. For instance, if 5 words appear 3 times in the txt file, those need to be a single dictionary grouping that will plot the frequency as the x axis and number of words at that frequency on the y axis.

What I have now is simply not working:

def plot(word_dict):
    new_dict = {}
    for value in word_dict.values():
        if value in word_dict:
             new_dict += 1
        else:
            new_dict = 1
        y = new_dict[value]
        x = word_dict[value]
    pyplot.plot(x, y)
    pyplot.show()

A sample of data:

{'bangs': 1, 'sees': 1, 'stuff,': 1, 'Knox....': 1, 'Well': 1, 'about': 2, 'your': 1, 'blocks.': 1, 'what': 4, 'beetles....': 1, 'Boom': 1, 'blue': 1, 'paddled': 1, 'mixed': 1, 'fox': 5, 'Through': 1, 'on': 16, 'trick,': 2, 'When': 4, '...a': 1, 'silly': 1, 'band.': 2, 'come.': 3, "We'll": 2, 'likes': 2, 'slick,': 1, 'comes?': 1, 'chick': 1, 'goo,': 1, "it's": 2, 'then,': 1, 'muddled': 1, 'Now': 3, 'not': 1, 'flew,': 1, 'If,': 1, 'sneeze.': 1, 'bottled': 1, 'paddle': 4, 'called': 1, 'Goo-Goose,': 1, 'Blue': 2, 'Come': 1, 'fox.': 1, 'can': 3, 'poodle,': 1, 'this': 7, "Sue's": 4, 'Ben': 5, 'is': 7, 'goes': 1, 'to': 10, 'Crow': 4, 'cheese': 2, 'quick': 5, 'sir.': 27, 'easy,': 1, 'Clocks': 2, 'Fox': 6, 'Stop': 2, 'up': 1, 'be': 1, 'Well...': 1, 'hose': 2, 'Rose': 1, 'three': 4, 'Freezy': 2, 'New': 3, 'hate': 1, 'broom': 2, 'quite': 1, 'duck': 3, 'we': 1, 'done,': 1, 'tick.': 2, "can't": 5, 'beetles?': 1, 'well,': 2, 'box.': 4, "That's": 4, 'Do,': 1, 'say': 4, 'chicks': 5, '...': 1, 'enough,': 1, 'brick': 1, 'lot': 1, 'You': 4, 'sick': 2, 'that': 1, 'goo.': 4, 'Gooey': 1, 'made': 3, 'new': 5, 'noodles...': 1, 'Knox,': 6, 'for': 2, 'muddle.': 1, 'Bricks': 1, 'Luck': 4, 'Bim': 5, 'minute,': 1, 'brings': 2, 'bottle': 4, 'duddled': 1, "I'll": 3, 'come': 2, 'battles': 1, 'clocks': 2, 'such': 2, 'Then': 1, 'in': 19, 'sir....': 1, 'Two': 1, 'Knox.': 2, "Luke's": 1, 'lakes.': 4, 'trees': 3, "isn't": 2, 'band!': 4, 'our': 1, 'And': 2, 'blubber!': 1, 'another': 1, 'sews': 9, "bottle's": 1, "Crow's": 3, 'Step': 1, 'What': 1, 'grows': 1, 'like': 1, 'ticks': 2, 'too': 1, 'trick': 4, 'Fox,': 3, 'goo': 2, 'chewing!': 1, 'blocks': 3, 'fleas': 3, 'a': 24, 'lakes': 2, "don't": 2, 'those': 1, 'Luke': 4, 'sorry,': 1, 'tocks,': 2, 'Whose': 1, 'you': 3, 'Here': 1, 'tricks': 2, "poodle's": 1, 'they': 3, 'that.': 1, 'doing.': 1, 'Gluey.': 2, 'eating': 1, 'sir!': 1, 'breeze': 2, 'My': 4, 'tweetle': 11, 'these': 5, 'puddle,': 2, 'chewy': 1, 'tongue': 3, 'talk': 1, 'with': 11, 'beetles': 6, 'noodle': 2, 'make': 5, 'who': 1, 'lame,': 1, 'flew.': 1, "I'm": 1, 'Fox!': 2, 'Nose': 1, 'the': 7, 'I': 9, "crow's": 2, 'Thank': 1, 'easy': 2, 'likes.': 2, 'battle': 7, 'licks': 4, 'goes.': 1, 'socks': 4, 'lead': 1, 'muddle': 1, 'shame,': 1, 'Please,': 1, 'fight,': 1, 'fun,': 1, 'chew,': 2, 'fuddled': 1, 'Broom': 1, 'No,': 1, 'Hose': 1, 'something': 2, 'find': 3, 'know': 1, 'Who': 4, 'call...': 1, 'First,': 1, 'Gooey.': 2, 'Look,': 2, 'fight': 1, 'This': 1, "Luck's": 1, 'poor': 2, 'now.': 6, 'freeze.': 2, 'game': 4, "Ben's": 5, 'it!': 2, 'Joe': 5, 'their': 2, 'you,': 1, 'Box': 1, 'bands.': 2, 'it': 3, 'bands': 1, 'bricks': 5, "here's": 1, "Let's": 3, 'Sue': 5, 'when': 2, 'clocks,': 2, 'breaks.': 2, 'puddle': 8, 'Socks': 4, 'sir,': 6, 'an': 2, "Bim's": 5, 'Pig': 2, 'now....': 1, 'battle.': 4, 'Slow': 5, 'sew': 2, 'blew.': 1, 'bring': 1, 'game,': 1, 'AND...': 3, 'and': 16, 'brooms.': 1, 'way.': 2, 'booms.': 1, 'lots': 1, 'clock': 1, 'comes.': 4, 'please....': 1, 'then...': 1, '...they': 2, 'say....': 1, 'beetle': 7, 'nose.': 1, 'slow,': 1, 'or': 1, 'Six': 2, 'AND': 1, 'block': 1, 'broom.': 4, 'do': 6, 'it,': 1, 'some.': 2, 'Duck': 1, 'sir?': 2, 'grows.': 1, 'this,': 1, 'Very': 2, 'Big': 2, 'whose': 3, 'noodle-eating': 1, 'chew': 2, 'choose': 2, 'Mr.': 13, 'band': 2, "Here's": 2, 'it.': 2, 'call': 3, 'dumb': 1, 'have': 2, 'so': 2, 'Goo-Goose': 1, 'say.': 2, 'socks.': 5, "trees'": 1, 'poodle': 3, 'socks,': 4, 'my': 1, 'While': 1, 'play.': 2, 'Chicks': 3, 'stack.': 4, 'rose': 2, 'freezy': 1, 'clothes.': 3, 'makes': 1, 'little': 1, 'paddles': 3, 'box': 2, 'all': 1, 'free': 2, 'blocks,': 1, 'Do': 1, 'blab': 1, 'THIS': 1, 'thing': 1, 'bends': 2, 'bent': 2, 'Knox': 8, 'socks?': 2, 'tock.': 2, 'wuddled': 1, 'much': 1, 'takes': 2, 'bends.': 2, 'wait': 1, 'see': 1, 'rubber.': 1, 'of': 4, 'clothes?': 2, 'mouth': 3, 'bottle...': 1, 'too,': 1, 'blibber': 1, 'Try': 2, 'where': 1, "won't": 2, 'get': 1}

Use the a Counter from collections library.

Since the values you want to count are values from your word_dict (ie the frequencies of each word). You'll need to initialize the Counter instance like freq = Counter(word_dict.values()) . Then you can extract the x and y series for your plot with c.keys() and c.values .

It seems as though you are attempting to plot strings along your x-axis, namely the keys you are using. This is not how pyplot works. You need to plot your values against a numeric vector (typically a numpy array). Once you have done this you can relabel your independent ( x ) vector using the xticks command.

x = numpy.linspace(0,len(new_dict.keys)-1,len(new_dict.keys))
pyplot.xticks(x, new_dict.keys)

Assuming you mean reversing the key, values, you can do:

>>> di={'bangs': 1, 'sees': 1, 'stuff,': 1, 'Knox....': 1, 'Well': 1, 'about': 2, 'your': 1, 'blocks.': 1, 'what': 4, 'beetles....': 1, 'Boom': 1, 'blue': 1, 'paddled': 1, 'mixed': 1, 'fox': 5, 'Through': 1, 'on': 16, 'trick,': 2, 'When': 4, '...a': 1, 'silly': 1, 'band.': 2, 'come.': 3, "We'll": 2, 'likes': 2, 'slick,': 1, 'comes?': 1, 'chick': 1, 'goo,': 1, "it's": 2, 'then,': 1, 'muddled': 1, 'Now': 3, 'not': 1, 'flew,': 1, 'If,': 1, 'sneeze.': 1, 'bottled': 1, 'paddle': 4, 'called': 1, 'Goo-Goose,': 1, 'Blue': 2, 'Come': 1, 'fox.': 1, 'can': 3, 'poodle,': 1, 'this': 7, "Sue's": 4, 'Ben': 5, 'is': 7, 'goes': 1, 'to': 10, 'Crow': 4, 'cheese': 2, 'quick': 5, 'sir.': 27, 'easy,': 1, 'Clocks': 2, 'Fox': 6, 'Stop': 2, 'up': 1, 'be': 1, 'Well...': 1, 'hose': 2, 'Rose': 1, 'three': 4, 'Freezy': 2, 'New': 3, 'hate': 1, 'broom': 2, 'quite': 1, 'duck': 3, 'we': 1, 'done,': 1, 'tick.': 2, "can't": 5, 'beetles?': 1, 'well,': 2, 'box.': 4, "That's": 4, 'Do,': 1, 'say': 4, 'chicks': 5, '...': 1, 'enough,': 1, 'brick': 1, 'lot': 1, 'You': 4, 'sick': 2, 'that': 1, 'goo.': 4, 'Gooey': 1, 'made': 3, 'new': 5, 'noodles...': 1, 'Knox,': 6, 'for': 2, 'muddle.': 1, 'Bricks': 1, 'Luck': 4, 'Bim': 5, 'minute,': 1, 'brings': 2, 'bottle': 4, 'duddled': 1, "I'll": 3, 'come': 2, 'battles': 1, 'clocks': 2, 'such': 2, 'Then': 1, 'in': 19, 'sir....': 1, 'Two': 1, 'Knox.': 2, "Luke's": 1, 'lakes.': 4, 'trees': 3, "isn't": 2, 'band!': 4, 'our': 1, 'And': 2, 'blubber!': 1, 'another': 1, 'sews': 9, "bottle's": 1, "Crow's": 3, 'Step': 1, 'What': 1, 'grows': 1, 'like': 1, 'ticks': 2, 'too': 1, 'trick': 4, 'Fox,': 3, 'goo': 2, 'chewing!': 1, 'blocks': 3, 'fleas': 3, 'a': 24, 'lakes': 2, "don't": 2, 'those': 1, 'Luke': 4, 'sorry,': 1, 'tocks,': 2, 'Whose': 1, 'you': 3, 'Here': 1, 'tricks': 2, "poodle's": 1, 'they': 3, 'that.': 1, 'doing.': 1, 'Gluey.': 2, 'eating': 1, 'sir!': 1, 'breeze': 2, 'My': 4, 'tweetle': 11, 'these': 5, 'puddle,': 2, 'chewy': 1, 'tongue': 3, 'talk': 1, 'with': 11, 'beetles': 6, 'noodle': 2, 'make': 5, 'who': 1, 'lame,': 1, 'flew.': 1, "I'm": 1, 'Fox!': 2, 'Nose': 1, 'the': 7, 'I': 9, "crow's": 2, 'Thank': 1, 'easy': 2, 'likes.': 2, 'battle': 7, 'licks': 4, 'goes.': 1, 'socks': 4, 'lead': 1, 'muddle': 1, 'shame,': 1, 'Please,': 1, 'fight,': 1, 'fun,': 1, 'chew,': 2, 'fuddled': 1, 'Broom': 1, 'No,': 1, 'Hose': 1, 'something': 2, 'find': 3, 'know': 1, 'Who': 4, 'call...': 1, 'First,': 1, 'Gooey.': 2, 'Look,': 2, 'fight': 1, 'This': 1, "Luck's": 1, 'poor': 2, 'now.': 6, 'freeze.': 2, 'game': 4, "Ben's": 5, 'it!': 2, 'Joe': 5, 'their': 2, 'you,': 1, 'Box': 1, 'bands.': 2, 'it': 3, 'bands': 1, 'bricks': 5, "here's": 1, "Let's": 3, 'Sue': 5, 'when': 2, 'clocks,': 2, 'breaks.': 2, 'puddle': 8, 'Socks': 4, 'sir,': 6, 'an': 2, "Bim's": 5, 'Pig': 2, 'now....': 1, 'battle.': 4, 'Slow': 5, 'sew': 2, 'blew.': 1, 'bring': 1, 'game,': 1, 'AND...': 3, 'and': 16, 'brooms.': 1, 'way.': 2, 'booms.': 1, 'lots': 1, 'clock': 1, 'comes.': 4, 'please....': 1, 'then...': 1, '...they': 2, 'say....': 1, 'beetle': 7, 'nose.': 1, 'slow,': 1, 'or': 1, 'Six': 2, 'AND': 1, 'block': 1, 'broom.': 4, 'do': 6, 'it,': 1, 'some.': 2, 'Duck': 1, 'sir?': 2, 'grows.': 1, 'this,': 1, 'Very': 2, 'Big': 2, 'whose': 3, 'noodle-eating': 1, 'chew': 2, 'choose': 2, 'Mr.': 13, 'band': 2, "Here's": 2, 'it.': 2, 'call': 3, 'dumb': 1, 'have': 2, 'so': 2, 'Goo-Goose': 1, 'say.': 2, 'socks.': 5, "trees'": 1, 'poodle': 3, 'socks,': 4, 'my': 1, 'While': 1, 'play.': 2, 'Chicks': 3, 'stack.': 4, 'rose': 2, 'freezy': 1, 'clothes.': 3, 'makes': 1, 'little': 1, 'paddles': 3, 'box': 2, 'all': 1, 'free': 2, 'blocks,': 1, 'Do': 1, 'blab': 1, 'THIS': 1, 'thing': 1, 'bends': 2, 'bent': 2, 'Knox': 8, 'socks?': 2, 'tock.': 2, 'wuddled': 1, 'much': 1, 'takes': 2, 'bends.': 2, 'wait': 1, 'see': 1, 'rubber.': 1, 'of': 4, 'clothes?': 2, 'mouth': 3, 'bottle...': 1, 'too,': 1, 'blibber': 1, 'Try': 2, 'where': 1, "won't": 2, 'get': 1}

new_di={}
for k, v in di.items():
    new_di.setdefault(v, []).append(k)

>>> new_di
{1: ['What', 'game,', 'Whose', 'Thank', 'Broom', 'goo,', 'bring', 'fuddled', 'hate', 'Hose', 'then,', 'sneeze.', 'Here', 'sir....', 'Please,', '...', 'it,', 'get', 'Goo-Goose', 'bands', 'muddle', 'nose.', 'Goo-Goose,', 'sorry,', 'not', "I'm", 'little', 'No,', 'like', 'THIS', 'poodle,', 'Knox....', 'Bricks', 'blibber', 'chick', 'where', 'Rose', 'see', 'noodle-eating', 'call...', 'fun,', 'blue', 'chewing!', 'clock', 'lots', 'slow,', 'sir!', 'chewy', 'goes', 'beetles?', 'Do', 'goes.', 'flew.', 'Box', 'be', 'we', 'eating', 'this,', 'stuff,', "poodle's", 'Duck', 'Well...', 'then...', 'quite', 'minute,', 'Step', 'doing.', 'wait', 'brooms.', 'bottle...', 'thing', 'bangs', 'mixed', 'fight,', 'makes', 'or', 'grows.', 'duddled', 'all', 'too,', 'Two', 'Gooey', 'Boom', 'another', 'If,', 'done,', 'your', '...a', 'First,', 'now....', 'fight', 'muddle.', "trees'", 'too', 'lot', 'enough,', 'blew.', 'brick', 'This', 'Come', 'easy,', 'that', 'Well', "Luke's", 'those', "here's", 'say....', 'up', 'you,', 'freezy', 'silly', 'flew,', 'wuddled', 'dumb', 'my', 'called', 'lame,', 'sees', 'Do,', 'comes?', "Luck's", 'blubber!', 'rubber.', 'shame,', 'paddled', 'Then', 'blab', 'battles', 'booms.', 'bottled', 'please....', 'Through', 'grows', 'muddled', 'that.', 'our', 'who', 'much', 'slick,', 'Nose', 'blocks,', "bottle's", 'While', 'beetles....', 'noodles...', 'lead', 'fox.', 'AND', 'blocks.', 'block', 'talk', 'know'], 2: ['Blue', "don't", 'choose', 'clocks', 'band.', 'tock.', 'Big', 'broom', 'some.', "crow's", 'easy', 'it.', 'it!', 'Try', 'tocks,', 'Pig', 'Clocks', "isn't", 'likes', 'sew', 'chew', 'bends', 'Very', 'box', 'puddle,', 'Knox.', 'band', 'Six', 'for', 'ticks', '...they', "Here's", 'hose', 'And', 'free', 'say.', 'come', 'about', 'chew,', 'likes.', 'Freezy', 'way.', 'tick.', 'rose', 'cheese', 'bent', 'takes', 'their', "it's", "We'll", 'Fox!', 'brings', 'noodle', 'clocks,', 'Gooey.', 'Gluey.', 'sir?', 'when', 'breaks.', 'have', 'an', 'well,', 'something', 'clothes?', 'bends.', 'Stop', 'trick,', 'sick', 'poor', "won't", 'bands.', 'goo', 'play.', 'socks?', 'such', 'tricks', 'freeze.', 'breeze', 'so', 'lakes', 'Look,'], 3: ['find', 'Now', 'mouth', 'trees', 'they', 'Chicks', 'fleas', 'New', 'come.', 'whose', 'AND...', 'tongue', 'poodle', 'duck', 'call', 'Fox,', "I'll", 'made', 'can', 'paddles', 'it', 'clothes.', "Let's", 'you', 'blocks', "Crow's"], 4: ['goo.', 'band!', 'game', 'socks', 'battle.', 'My', 'lakes.', 'broom.', 'what', 'paddle', "Sue's", 'of', 'When', 'Socks', 'three', 'box.', 'licks', "That's", 'trick', 'socks,', 'say', 'comes.', 'You', 'stack.', 'Luke', 'Who', 'Luck', 'Crow', 'bottle'], 5: ['chicks', 'Bim', 'quick', 'Sue', 'fox', 'Joe', 'new', "Bim's", "can't", 'bricks', 'socks.', "Ben's", 'Ben', 'Slow', 'make', 'these'], 6: ['Fox', 'Knox,', 'do', 'now.', 'sir,', 'beetles'], 7: ['beetle', 'battle', 'this', 'is', 'the'], 8: ['Knox', 'puddle'], 9: ['sews', 'I'], 10: ['to'], 11: ['tweetle', 'with'], 13: ['Mr.'], 16: ['and', 'on'], 19: ['in'], 24: ['a'], 27: ['sir.']}

I'm not sure what you used for tokenizing your data, but a quick solution could be using nltk .

Here is a small example on how it can be done:

# necessary imports
from nltk import FreqDist # used later to plot and get count
from nltk.tokenize import word_tokenize # tokenizes our sentence by word

# sample text
text = 'this is a super long text, that has some random words in it. It is not really 
        that long, but could be very long.'
tknz = word_tokenize(text) # tokenizes the text into ('this', 'is',...)
fdist = FreqDist(tknz) # creates frequency distribution from the tokenized words

From that you can simply do fdis.plot() which gives:

在此处输入图片说明

From here you have a matplotlib plot that you can edit, and it only took a few lines to obtain.

You can find additional information about FreqDist here . It also behaves like a dictionary:

>>> fdist.items()
dict_items([(',', 2), ('in', 1), ('a', 1), ('very', 1), ('really', 1), ('be', 1), ...])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM