简体   繁体   中英

R - ggplot2 parallel categorical plot

I am working with categorical longitudinal data. My data has 3 simple variables such as :

       id variable value
1       1        1     c
2       1        2     b
3       1        3     c
4       1        4     c
5       1        5     c
...

Where variable is basically time , and value are the 3 possible categories one id can take.

I am interested in producing a "parallel" longitudinal graph, similar to this with ggplot2

在此处输入图片说明

I am struggling a bit to get it right. What I came up for now is this :

dt0 %>% ggplot(aes(variable, value, group = id, colour = id)) +
  geom_line(colour="grey70") +
  geom_point(aes(colour=value, size = nn), size=4) + 
  scale_colour_brewer(palette="Set1") + theme_minimal()

在此处输入图片说明

The issue with this graph is that we can't really see the "thickness" of the "transition" (the id lines).

I wondered if you could help me for :

a) help make visible the id lines, or make it "thicker" according to the number of id going form one state to the other

b) I also would like to re-size the point according to the number of id in this state. I tried to do it with geom_point(aes(colour=value, size = nn), size=4) but it doesn't seem to work.

Thanks.

# data # 
library(dplyr) 
library(ggplot2) 

set.seed(10)

# generate random sequences # 
dt = as.data.frame( cbind(id = 1:1000, replicate(5, sample( c('a', 'b', 'c'), prob = c(0.1,0.2,0.7), 1000, replace = T)) ) ) 

# transform to PP file # 
dt = dt %>% melt(id.vars = c('id'))

# create a vector 1-0 if the activity was performed # 
dt0 = dt %>% group_by(id) %>% mutate(variable = 1:n()) %>% arrange(id)

# create the number of people in that state # 
dt0 = dt0 %>% count(id, variable, value)
dt0 = dt0 %>% group_by(variable, value, n) %>% mutate(nn = n()) 

# to produce the first graph # 
library(vcrpart) 
otsplot(dt0$variable, factor(dt0$value), dt0$id)

you were so close with geom_point(aes(colour=value, size = nn), size=4) , the problem was that with you redefined size after defining it in aes() ggplot overwrote the variable reference with the constant 4. Assuming you want to use nn to scale line thinkness as well, you could tweak your code to this:

dt0 %>% ggplot(aes(variable, value, group = id, colour = id)) +
    geom_line(colour="grey70", aes(size = nn)) +
    geom_point(aes(colour=value, size = nn)) + 
    scale_colour_brewer(palette="Set1") + theme_minimal()

If you wanted to use a lag value for the line thickness I would suggests adding that as a new column in dt0 .

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM