I am working with a csv file with pandas module on python3. Csv file consists of 5 columns: job, company's name, description of the job, amount of reviews, location of the job; and i want to plot a frequency histogram , where i pick only the jobs containing the words "mechanical engineer" and find the frequencies of the 5 most frequent locations for the "mechanical engineer" job.
So,i defined a variable engloc which stores all the "mechanical engineer" jobs.
engloc=df[df.position.str.contains('mechanical engineer|mechanical engineering', flags=re.IGNORECASE, regex=True)].location
and did a histogram plot with matplotlib with code i found online
x = np.random.normal(size = 1000)
plt.hist(engloc, bins=50)
plt.gca().set(title='Frequency Histogram ', ylabel='Frequency');
but it printed like this
How can i plot a proper frequency histogram where it plots using only 5 of the most frequent locations for jobs containing "mechanical engineer" words, instead of putting all of the locations in the graph?
Something along the following lines should help you with numerical data:
import numpy as np
counts_, bins_ = np.histogram(englog.values)
filtered = [(c,b) for (c,b) in zip(counts_,bins_) if counts_>=5]
counts, bins = list(zip(*filtered))
plt.hist(bins[:-1], bins, weights=counts)
For a string type try:
from collections import Counter
coords, counts = list(zip(*Counter(englog.values).most_common(5)))
plt.bar(coords, counts)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.