简体   繁体   中英

Matplotlib Histogram with non numerical data

Can not plot an Histogram in Matplotlib with non numerical data.

A = na, R, O, na, na, O, R ...

A is a dataframe that takes 3 different values: na, R, O

I try:

plt.hist(A, bins=3, color='#37777D')

Would expect something like this Result

It works with numerical data, but with non numerical data I get this error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-44-60369a6f9af4> in <module>
      1 A = dataset2.iloc[:, 2 - 1].head(30)
----> 2 plt.hist(A, bins=3, histtype='bar', color='#37777D')

C:\Anaconda\lib\site-packages\matplotlib\pyplot.py in hist(x, bins, range, density, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, normed, data, **kwargs)
   2657         align=align, orientation=orientation, rwidth=rwidth, log=log,
   2658         color=color, label=label, stacked=stacked, normed=normed,
-> 2659         **({"data": data} if data is not None else {}), **kwargs)
   2660 
   2661 

C:\Anaconda\lib\site-packages\matplotlib\__init__.py in inner(ax, data, *args, **kwargs)
   1808                         "the Matplotlib list!)" % (label_namer, func.__name__),
   1809                         RuntimeWarning, stacklevel=2)
-> 1810             return func(ax, *args, **kwargs)
   1811 
   1812         inner.__doc__ = _add_data_doc(inner.__doc__,

C:\Anaconda\lib\site-packages\matplotlib\axes\_axes.py in hist(self, x, bins, range, density, weights, cumulative, bottom, histtype, align, orientation, rwidth, log, color, label, stacked, normed, **kwargs)
   6563                     "color kwarg must have one color per data set. %d data "
   6564                     "sets and %d colors were provided" % (nx, len(color)))
-> 6565                 raise ValueError(error_message)
   6566 
   6567         # If bins are not specified either explicitly or via range,

ValueError: color kwarg must have one color per data set. 30 data sets and 1 colors were provided

I think you need a bar chart instead of a histogram. Moreover, it is unclear what your values are. Considering they are strings (based on the plot), you need to first count their frequencies using for example Counter module. Then you can plot the frequencies and assign the names of the keys as the tick labels.

from collections import Counter
from matplotlib import pyplot as plt

A = ['na', 'R', 'O', 'na', 'na', 'R']

freqs = Counter(A)

xvals = range(len(freqs.values()))
plt.bar(xvals, freqs.values() , color='#37777D')
plt.xticks(xvals, freqs.keys())
plt.show() 

在此处输入图片说明

This is not reproducible. But if we create a dataframe and run the following code

import numpy as np; np.random.seed(42)
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.choice(["na", "O", "A"], size=10))

plt.hist(df.values, histtype='bar', bins=3)

plt.show()

在此处输入图片说明

Now this may not be the best choice anyways, because histograms are continuous by definition. So one may create a bar plot of the counts instead.

import numpy as np; np.random.seed(42)
import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.choice(["na", "O", "A"], size=10))

counts = df[0].value_counts()
plt.bar(counts.index, counts.values)

plt.show()

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM