从 csv 数据绘制频率直方图时出错

Question

I am working with a csv file with pandas module on python3.我正在使用 python3 上带有 pandas 模块的 csv 文件。 Csv file consists of 5 columns: job, company's name, description of the job, amount of reviews, location of the job; .csv 文件由 5 列组成：职位、公司名称、职位描述、评论数量、职位位置； and i want to plot a frequency histogram , where i pick only the jobs containing the words "mechanical engineer" and find the frequencies of the 5 most frequent locations for the "mechanical engineer" job.我想绘制频率直方图，在那里我只选择包含“机械工程师”一词的工作，并找到“机械工程师”工作最常见的 5 个位置的频率。

So,i defined a variable engloc which stores all the "mechanical engineer" jobs.所以，我定义了一个变量 engloc 来存储所有“机械工程师”的工作。

engloc=df[df.position.str.contains('mechanical engineer|mechanical engineering', flags=re.IGNORECASE, regex=True)].location

and did a histogram plot with matplotlib with code i found online并使用我在网上找到的代码使用 matplotlib 绘制了直方图

 x = np.random.normal(size = 1000)
 plt.hist(engloc, bins=50)
 plt.gca().set(title='Frequency Histogram ', ylabel='Frequency');

but it printed like this但它是这样打印的

How can i plot a proper frequency histogram where it plots using only 5 of the most frequent locations for jobs containing "mechanical engineer" words, instead of putting all of the locations in the graph?我如何绘制一个合适的频率直方图，它只使用 5 个最常见的位置来绘制包含“机械工程师”字样的工作，而不是将所有位置都放在图中？

This is a sample from the csv file这是来自 csv 文件的示例

Answer 1

Something along the following lines should help you with numerical data:以下几行应该可以帮助您处理数值数据：

import numpy as np
counts_, bins_ = np.histogram(englog.values)
filtered = [(c,b) for (c,b) in zip(counts_,bins_) if counts_>=5]
counts, bins = list(zip(*filtered))
plt.hist(bins[:-1], bins, weights=counts)

For a string type try:对于字符串类型尝试：

from collections import Counter 
coords, counts = list(zip(*Counter(englog.values).most_common(5)))
plt.bar(coords, counts)

从 csv 数据绘制频率直方图时出错

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-02-05 19:56:41

从 csv 数据绘制频率直方图时出错

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-02-05 19:56:41

解决方案1
1 已采纳 2020-02-05 19:56:41