简体   繁体   中英

Parallel coordinates plot with skipped coordinates

People are racing at 100 m, 400 m, 1600 m tracks and their finish time is recorded. I want to present data for each racer in parallel coordinates plot. Some racers may not finish the track. In this case I would like to mark it somehow, either by an infinity point or somehow with a color for a specific track.

As an example I made a parallel coordinates plot in paint: 在此处输入图片说明
Lazyman hasn't finished the 1600m track and this is marked with x.

An example data set is given in the following "racing.csv":

RACER,TRACK.100m,TRACK.400m,TRACK.1500m
Superman,0.1,0.5,1
Lazyman,200,900,Inf

I have tried a solution with pandas:

import pandas
from pandas.tools.plotting import parallel_coordinates
import matplotlib.pyplot as plt

d = pandas.read_csv('racing.csv')

f = plt.figure()
parallel_coordinates(d, 'RACER')
f.axes[0].set_yscale('log')

plt.show()

This gives a plot without Inf value for Lazyman at 1600m: 在此处输入图片说明

I also prepared a csv for ggplot (there may be a better way to do this):

RACER,TRACK,TIME
Superman,100m,0.1
Superman,400m,0.5
Superman,1600m,1
Lazyman,100m,200
Lazyman,400m,900
Lazyman,1600m,Inf

With using ggplot:

require(ggplot2)
d <- read.csv('racing2.csv')
g <- ggplot(d) + geom_line(aes(x=TRACK,y=TIME,group=RACER, color=RACER))
g <- g + scale_y_log10()
ggsave('ggplot.png')

I got closer:

在此处输入图片说明
as this shows an infinity value, but doesn't make any annotation to it.

Any solution, either Python or R, will be appreciated. Also, suggestions regarding marking unfinished races are appreciated.

With R and ggplot2 :

Build some bogus data:

df <- data.frame(ID = factor(c(rep(1, 3), rep(2, 3), rep(3, 3)), labels = c('Realman', 'Lazyman', 'Superman')),
             race = factor(rep(seq(1,3,1), 3), labels = c('100m', '400m', '1600m')),
             runTime = c(8.9, 20.5, 150.9, 100.1, 300.3, +Inf, 1.2, 5, +Inf))

        ID  race runTime
# 1  Realman  100m     8.9
# 2  Realman  400m    20.5
# 3  Realman 1600m   150.9
# 4  Lazyman  100m   100.1
# 5  Lazyman  400m   300.3
# 6  Lazyman 1600m     Inf
# 7 Superman  100m     1.2
# 8 Superman  400m     5.0
# 9 Superman 1600m     Inf

Result:

在此处输入图片说明

Code:

ggplot(filter(df, runTime != +Inf), aes(x = race, y = runTime, group = ID, color = ID)) + 
    geom_line(size = 2) +
    geom_point(size = 4) +

    geom_line(data = df, linetype = 'dashed', size = 1) +        
    geom_point(data = df, shape = 21, size = 1) +

    geom_text(aes(label = runTime), position = position_nudge(y = -.1)) +

    scale_y_continuous(trans = 'log10', breaks = c(1, 10, 100, 1000)) +
    scale_x_discrete('Track') +
    scale_color_manual('Racer', values = brewer.pal(length(levels(df$ID)), 'Set1')) +

    theme(panel.background = element_blank(),
          panel.grid.major.x = element_line(colour = 'lightgrey', size = 25),
          legend.position = 'top',
          axis.line.y = element_line('black', .5, arrow = arrow()))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM