Big data visualization for multiple sampled data points from a large log

Question

I have a log file which I need to plot in python with different data points as a multi line plot with a line for each unique point , the problem is that in some samples some points would be missing and new points would be added in another, as shown is an example with each line denoting a sample of n points where n is variable:

2015-06-20 16:42:48,135 current stats=[ ('keypassed', 13), ('toy', 2), ('ball', 2),('mouse', 1) ...] 2015-06-21 16:42:48,135 current stats=[ ('keypassed', 20, ('toy', 5), ('ball', 7), ('cod', 1), ('fish', 1) ... ]

in the above 1 st sample 'mouse ' is present but absent in the second line with new data points in each sample added like 'cod','fish'

so how can this be done in python in the quickest and cleanest way? are there any existing python utilities which can help to plot this timed log file? Also being a log file the samples are thousands in numbers so the visualization should be able to properly display it.

Interested to apply multivariate hexagonal binning to this and different color hexagoan for each unique column "ball,mouse ... etc". scikit offers hexagoanal binning but cant figure out how to render different colors for each hexagon based on the unique data point. Any other visualization technique would also help in this.

Answer 1

Getting the data into pandas:

import pandas as pd
df = pd.DataFrame(columns = ['timestamp','name','value'])
with open(logfilepath) as f:
   for line in f.readlines():
      timestamp = line.split(',')[0]
      #the data part of each line can be evaluated directly as a Python list
      data = eval(line.split('=')[1])
      #convert the input data from wide format to long format
      for name, value in data:
         df = df.append({'timestamp':timestamp, 'name':name, 'value':value}, 
                             ignore_index = True)

#convert from long format back to wide format, and fill null values with 0
df2 = df.pivot_table(index = 'timestamp', columns = 'name')
df2 = df2.fillna(0)
df2
Out[142]: 
                    value                             
name                 ball cod fish keypassed mouse toy
timestamp                                             
2015-06-20 16:42:48     2   0    0        13     1   2
2015-06-21 16:42:48     7   1    1        20     0   5

Plot the data:

import matplotlib.pylab as plt
df2.value.plot()
plt.show()

在此处输入图片说明

Big data visualization for multiple sampled data points from a large log

Question

1 answers

solution1
1 2015-06-22 01:51:08

Big data visualization for multiple sampled data points from a large log

Question

1 answers

solution1 1 2015-06-22 01:51:08

solution1
1 2015-06-22 01:51:08