[英]How to generate a bar chart with data from a csv?
I have a csv with several columns, one of them is the city column.我有一个 csv 有几列,其中之一是城市列。 There are several cities and also the same city, repeated several times.有几个城市,也有同一个城市,重复了好几次。 I would like to set up a bar chart with how many cities appear in CSV.我想设置一个条形图,显示 CSV 中出现了多少个城市。 Example:例子:
Y X
5 Belo Horizonte
1 Vespasiano
4 São Paulo
I made the following code, but I have gotten error, which is right after the code.我编写了以下代码,但出现错误,就在代码之后。
Code:代码:
import matplotlib.pyplot as plt; plt.rcdefaults()
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
#lendo o arquivo
tb_usuarios = 'tb_usuarios.csv'
usuarios = pd.read_csv(tb_usuarios,
header=0,
index_col=False
)
print(usuarios.head())
usuarios["vc_municipio"] = usuarios["vc_municipio"].dropna()
usuarios["vc_municipio"] = usuarios["vc_municipio"].str.upper()
municipio = usuarios.groupby(['vc_municipio'])
print(municipio)
y_pos = usuarios.groupby(['vc_municipio'])['vc_municipio'].count()
print(y_pos)
plt.bar(y_pos, municipio, align='center', alpha=0.5)
plt.xticks(y_pos, municipio)
plt.ylabel('Qtd')
plt.title('Municipio')
plt.show()
Error:错误:
Traceback (most recent call last):
File "C:/Users/Henrique Mendes/PycharmProjects/emprestimo/venv1/emprestimo.py", line 20, in <module>
plt.bar(y_pos, municipio, align='center', alpha=0.5)
File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\pyplot.py", line 2440, in bar
**({"data": data} if data is not None else {}), **kwargs)
File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\__init__.py", line 1601, in inner
return func(ax, *map(sanitize_sequence, args), **kwargs)
File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\axes\_axes.py", line 2348, in bar
self._process_unit_info(xdata=x, ydata=height, kwargs=kwargs)
File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\axes\_base.py", line 2126, in _process_unit_info
kwargs = _process_single_axis(ydata, self.yaxis, 'yunits', kwargs)
File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\axes\_base.py", line 2108, in _process_single_axis
axis.update_units(data)
File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\axis.py", line 1493, in update_units
default = self.converter.default_units(data, self)
File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\category.py", line 115, in default_units
axis.set_units(UnitData(data))
File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\category.py", line 181, in __init__
self.update(data)
File "C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\lib\site-packages\matplotlib\category.py", line 215, in update
for val in OrderedDict.fromkeys(data):
TypeError: unhashable type: 'numpy.ndarray'
My outputs:我的输出:
"C:\Users\Henrique Mendes\PycharmProjects\emprestimo\venv1\Scripts\python.exe" "C:/Users/Henrique Mendes/PycharmProjects/emprestimo/venv1/emprestimo.py"
pr_usuario bl_administrador dt_nascimento ... dt_cheque es_anexo dt_anexo
0 2 0 24/02/1980 ... NaN NaN NaN
1 3 0 05/09/1985 ... NaN NaN NaN
2 4 1 20/03/1984 ... NaN NaN NaN
3 5 1 20/01/1982 ... NaN NaN NaN
4 6 0 25/05/1985 ... NaN NaN NaN
[5 rows x 30 columns]
{'BELO HORIZONTE': Int64Index([0, 1, 2, 3, 6, 9, 10, 14, 17, 20, 22, 25], dtype='int64'), 'BRASILIA': Int64Index([4], dtype='int64'), 'CONTAGEM': Int64Index([23], dtype='int64'), 'CURITIBA': Int64Index([5, 7, 15, 18, 19], dtype='int64'), 'SANTA LUZIA': Int64Index([21], dtype='int64'), 'VESPASIANO': Int64Index([24], dtype='int64')}
vc_municipio
BELO HORIZONTE 12
BRASILIA 1
CONTAGEM 1
CURITIBA 5
SANTA LUZIA 1
VESPASIANO 1
Name: vc_municipio, dtype: int64
How can I do this chart?我怎样才能做这个图表?
municipio = usuarios.groupby(['vc_municipio'])
returns a groupby object in pandas which is causing your error as matplotlib doesn't handle that. municipio = usuarios.groupby(['vc_municipio'])
在 pandas 中返回一个 groupby object,这会导致您的错误,因为 ZF02113237A5A5FFF03E34C9EEEB4664Z 没有处理。
plt.bar
takes x values followed by y values (see docs ). plt.bar
采用 x 值后跟 y 值(请参阅docs )。
matplotlib.pyplot.bar(x, height, width=0.8, bottom=None, *, align='center', data=None, **kwargs) matplotlib.pyplot.bar(x, height, width=0.8, bottom=None, *, align='center', data=None, **kwargs)
Luckily for you, when you do a groupby
in pandas it automatically consolidates x values (or categories) as indices for you.幸运的是,当您在groupby
中进行分组时,它会自动将 x 值(或类别)合并为您的索引。
Assuming that municipio
is meant to be a list of categories (you want the count by city?) then the following should work.假设municipio
是一个类别列表(您想要按城市计数?)那么以下应该有效。
Replacing your code替换你的代码
plt.bar(y_pos, municipio, align='center', alpha=0.5)
with和
plt.bar(y_pos.index, y_pos, align='center', alpha=0.5)
Alternatively, you can use the pandas version of plt.bar
(which extends matplot lib) to natively handle some of the dataframe quirks.或者,您可以使用 plt.bar 的plt.bar
版本(扩展 matplot lib)来本地处理一些 dataframe 怪癖。
pandas
:使用pandas
:.csv
with the following form假设您的数据位于.csv
中,格式如下0.0,BELO HORIZONTE
1.0,BELO HORIZONTE
2.0,BELO HORIZONTE
3.0,BELO HORIZONTE
6.0,BELO HORIZONTE
9.0,BELO HORIZONTE
10.0,BELO HORIZONTE
14.0,BELO HORIZONTE
17.0,BELO HORIZONTE
20.0,BELO HORIZONTE
22.0,BELO HORIZONTE
25.0,BELO HORIZONTE
4.0,BRASILIA
23.0,CONTAGEM
5.0,CURITIBA
7.0,CURITIBA
15.0,CURITIBA
18.0,CURITIBA
19.0,CURITIBA
21.0,SANTA LUZIA
24.0,VESPASIANO
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('test.csv', header=None)
df.columns = ['value', 'city']
value city
0 0.0 BELO HORIZONTE
1 1.0 BELO HORIZONTE
2 2.0 BELO HORIZONTE
3 3.0 BELO HORIZONTE
4 6.0 BELO HORIZONTE
5 9.0 BELO HORIZONTE
6 10.0 BELO HORIZONTE
7 14.0 BELO HORIZONTE
8 17.0 BELO HORIZONTE
9 20.0 BELO HORIZONTE
10 22.0 BELO HORIZONTE
11 25.0 BELO HORIZONTE
12 4.0 BRASILIA
13 23.0 CONTAGEM
14 5.0 CURITIBA
15 7.0 CURITIBA
16 15.0 CURITIBA
17 18.0 CURITIBA
18 19.0 CURITIBA
19 21.0 SANTA LUZIA
20 24.0 VESPASIANO
# groupby & count
city_count = df.groupby('city').count()
value
city
BELO HORIZONTE 12
BRASILIA 1
CONTAGEM 1
CURITIBA 5
SANTA LUZIA 1
VESPASIANO 1
# plot
city_count.plot.bar()
plt.ylabel('Qtd')
plt.title('Municipio')
plt.show()
seaborn
: Plot 与seaborn
:import seaborn as sns
sns.barplot(x=city_count.index, y='value', data=city_count)
plt.xticks(rotation=45)
plt.show()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.