简体   繁体   中英

plotting histogram from csv file using matplotlib and pandas

my csv file is very complex.. it contains numeric as well as string attributes. this is how my csv file looks like在此处输入图片说明 I want to plot a histogram of processes versus the cpuid

You can use read_csv , indexing with str and plot by hist :

import pandas as pd
import matplotlib.pyplot as plt
import io

temp=u"""kmem_kmalloc;{cpu_id=1}
kmem_kmalloc;{cpu_id=1}
kmem_kmalloc;{cpu_id=1}
kmem_kmalloc;{cpu_id=1}
kmem_kfree;{cpu_id=1}
kmem_kfree;{cpu_id=1}
power_cpu_idle;{cpu_id=0}
power_cpu_idle;{cpu_id=0}
power_cpu_idle;{cpu_id=3}"""

s = pd.read_csv(io.StringIO(temp), #after testing replace io.StringIO(temp) to filename
                sep=";", #set separator, if sep=',' can be omited (default sep = ,)
                header=None, #no header in csv
                names=[None,'cpuid'], #set names of columns, (first is None because index)
                index_col=0, #first column set to index
                squeeze=True) #try convert DataFrame to Series
print s
kmem_kmalloc      {cpu_id=1}
kmem_kmalloc      {cpu_id=1}
kmem_kmalloc      {cpu_id=1}
kmem_kmalloc      {cpu_id=1}
kmem_kfree        {cpu_id=1}
kmem_kfree        {cpu_id=1}
power_cpu_idle    {cpu_id=0}
power_cpu_idle    {cpu_id=0}
power_cpu_idle    {cpu_id=3}
Name: cpuid, dtype: object
#if max cpu <= 9, use Indexing with .str 
s = s.str[-2].astype(int)

#if cpu > 9 
#s= s.str.extract('(\d)', expand=False)
print s
kmem_kmalloc      1
kmem_kmalloc      1
kmem_kmalloc      1
kmem_kmalloc      1
kmem_kfree        1
kmem_kfree        1
power_cpu_idle    0
power_cpu_idle    0
power_cpu_idle    3
Name: cpuid, dtype: int32

plt.figure();
s.hist(alpha=0.5)
plt.show()

图形

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM