简体   繁体   中英

create a simple point plot in Python

I am supposed to count n-grams in a corpus and create a point plot that would show the ranks of words and their counts as an exercise to verify the Zipf's law . The end result should for example look something like this:

在此处输入图片说明

I extracted the distributions (here only for unigrams) using nltk as such:

import nltk
with open(r'./1.txt', 'r') as file:
    text = file.read()
    file.close()

tokens = nltk.word_tokenize(text)
tokens = [token.lower() for token in tokens if len(token) > 1]
fdist = nltk.FreqDist(tokens)
ranks = fdist.most_common()

This gives me a long list of 2-tuples of all the words and their counts ranked from the most common to the least.

I am wondering how I should proceed from here. I just have to plot this on a two-axis plane. I don't have matpotlib/numpy installed and don't have any experience in those libraries. However I have Microsoft Excel, so I was wondering if I could somehow export this data in a format readable by Excel and plot it there.

The following lines will plot your data the way you requested using matplotlib:

import matplotlib.pyplot as plt
plt.plot(range(len(ranks)), [r[1] for r in ranks], 'ro')
plt.ylim([0,12])
plt.xlim([0,10])
plt.show()

Installing matplotlib is simple. See here for instructions for your operating system: http://matplotlib.org/users/installing.html

If you're going to do plotting with python, install matplotlib. Get your data into two vectors, x and y . The corresponding entries are the x and y values.

Then simply do

import pylab
pylab.plot(x, y, '.')
pylab.savefig('myfilename.pdf')

the '.' tells it to plot dots.

You can save in a large number of formats other than .pdf To save in another format, just change the .pdf extension to be whatever you want it to be. If it's an acceptable format it'll do it.

You could create an Excel scatter plot using XlsxWriter :

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM