简体   繁体   中英

Big data visualization for multiple sampled data points from a large log

I have a log file which I need to plot in python with different data points as a multi line plot with a line for each unique point , the problem is that in some samples some points would be missing and new points would be added in another, as shown is an example with each line denoting a sample of n points where n is variable:

2015-06-20 16:42:48,135 current stats=[ ('keypassed', 13), ('toy', 2), ('ball', 2),('mouse', 1) ...] 2015-06-21 16:42:48,135 current stats=[ ('keypassed', 20, ('toy', 5), ('ball', 7), ('cod', 1), ('fish', 1) ... ]

in the above 1 st sample 'mouse ' is present but absent in the second line with new data points in each sample added like 'cod','fish'

so how can this be done in python in the quickest and cleanest way? are there any existing python utilities which can help to plot this timed log file? Also being a log file the samples are thousands in numbers so the visualization should be able to properly display it.

Interested to apply multivariate hexagonal binning to this and different color hexagoan for each unique column "ball,mouse ... etc". scikit offers hexagoanal binning but cant figure out how to render different colors for each hexagon based on the unique data point. Any other visualization technique would also help in this.

Getting the data into pandas:

import pandas as pd
df = pd.DataFrame(columns = ['timestamp','name','value'])
with open(logfilepath) as f:
   for line in f.readlines():
      timestamp = line.split(',')[0]
      #the data part of each line can be evaluated directly as a Python list
      data = eval(line.split('=')[1])
      #convert the input data from wide format to long format
      for name, value in data:
         df = df.append({'timestamp':timestamp, 'name':name, 'value':value}, 
                             ignore_index = True)

#convert from long format back to wide format, and fill null values with 0
df2 = df.pivot_table(index = 'timestamp', columns = 'name')
df2 = df2.fillna(0)
df2
Out[142]: 
                    value                             
name                 ball cod fish keypassed mouse toy
timestamp                                             
2015-06-20 16:42:48     2   0    0        13     1   2
2015-06-21 16:42:48     7   1    1        20     0   5

Plot the data:

import matplotlib.pylab as plt
df2.value.plot()
plt.show()

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM