[英]R: How to visualize change in binary/categorical data over time
>dput(data)
structure(list(ID = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3,
3, 3), Dx = c(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 1, 1, 1, 1, 1), Month = c(0,
6, 12, 18, 24, 0, 6, 12, 18, 24, 0, 6, 12, 18, 24), score = c(0,
0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0)), .Names = c("ID",
"Dx", "Month", "score"), row.names = c(NA, -15L), class = "data.frame")
>data
ID Dx Month score
1 1 1 0 0
2 1 1 6 0
3 1 1 12 0
4 1 1 18 1
5 1 1 24 1
6 2 1 0 1
7 2 1 6 1
8 2 2 12 1
9 2 2 18 0
10 2 2 24 1
11 3 1 0 0
12 3 1 6 0
13 3 1 12 0
14 3 1 18 0
15 3 1 24 0
Suppose I have the above data.frame. 假设我有上面的data.frame。 I have 3 patients ( ID
= 1, 2 or 3). 我有3位患者( ID
= 1、2或3)。 Dx
is the diagnosis ( Dx
= 1 is normal, = 2 is diseased). Dx
是诊断( Dx
= 1正常,= 2患病)。 There is a month variable. 有一个月份变量。 And last but not least, is a test score variable. 最后但并非最不重要的是测试分数变量。 The participants' test score is binary, and it can change from 0 or 1 or revert back from 1 to 0. I am having trouble coming up with a way to visualize this data. 参与者的测试成绩是二进制的,并且可以从0或1更改或从1还原为0。我很难找到一种可视化此数据的方法。 I would like an informative graph that looks at: 我想要一个内容丰富的图表,其中包含:
In my real dataset I have over 800 participants, so I do not want to construct 800 separate graphs ... I think the test score variable being binary really has me stumped. 在我的真实数据集中,我有800多名参与者,所以我不想构造800个单独的图...我认为测试分数变量为二进制确实让我感到困惑。 Any help would be appreciated. 任何帮助,将不胜感激。
With ggplot2
you can make faceted plots with subplots for each patient (see my solution for dealing with the large number of plots below). 使用ggplot2
您可以为每个患者创建带有子图的多面图(请参阅下面的我的解决方案,以处理大量图)。 An example visualization: 可视化示例:
library(ggplot2)
ggplot(data, aes(x=Month, y=score, color=factor(Dx))) +
geom_point(size=5) +
scale_x_continuous(breaks=c(0,6,12,18,24)) +
scale_color_discrete("Diagnosis",labels=c("normal","diseased")) +
facet_grid(.~ID) +
theme_bw()
which gives: 这使:
Including 800 patients in one plot might be a bit too much as already mentioned in the comments of the question. 正如问题评论中已经提到的那样,在一个小区中包括800名患者可能有点过多。 There are several solutions to this problem: 有几种解决此问题的方法:
With regard to the last suggestion, you can do that with the following code (which I adapted from an answer to one of my own questions): 关于最后一个建议,你可以做到这一点与下面的代码(这是我改编自一个答案 ,以我自己的问题之一):
deleteable <- with(data, ave(Dx, ID, FUN=function(x) all(x==1)))
data2 <- data[deleteable==0,]
You can use this as well for creating a new variable identifying patient who have been ill: 您也可以使用它来创建一个新的变量来识别患病的患者:
data$neverill <- with(data, ave(Dx, ID, FUN=function(x) all(x==1)))
Then you can for example aggregate the data with the several grouping variables (eg Month
, neverill
). 然后,您可以例如使用几个分组变量(例如Month
, neverill
)聚合数据。
Note: A lot of the following data manipulation needs to be done for part 2. Part 1 is less complex, and you can see it fit in below. 注意:对于第2部分,需要完成以下许多数据操作。第1部分不太复杂,您可以在下面看到它。
Uses 用途
library(data.table)
library(ggplot2)
library(reshape2)
To Compare 比较
First, change the Dx from 1 to 2 to 0 to 1 (Assuming that a 0 in score corresponds to a 1 in Dx) 首先,将Dx从1更改为2,再将0更改为1(假设分数的0对应于Dx的1)
data$Dx <- data$Dx - 1
Now, create a matrix that returns a 1 for a 1 diagnosis with a 0 test, and a -1 for a 1 test with a 0 diagnosis. 现在,创建一个矩阵,该矩阵对于0诊断的1诊断将返回1,对于1诊断0的诊断将返回-1。
compare <- matrix(c(0,1,-1,0),ncol = 2,dimnames = list(c(0,1),c(0,1)))
> compare
0 1
0 0 -1
1 1 0
Now, lets score every event. 现在,让每个事件得分。 This simply looks up the matrix above for every entry in your matrix: 这只是为矩阵中的每个条目查找上面的矩阵:
data$calc <- diag(compare[as.character(data$Dx),as.character(data$score)])
*Note: This can be sped up for large matrices using matching, but it is a quick fix for smaller sets like yours *注意:使用匹配可以加快大型矩阵的速度,但是对于像您这样的较小集合,这是快速解决方案
To allow us to use data.table
aggregation: 为了允许我们使用data.table
聚合:
data <- data.table(data)
Now we need to create our variables: 现在我们需要创建变量:
tograph <- melt(data[, list(ScoreTrend = sum(score)/.N,
Type = sum(calc)/length(calc[calc != 0]),
Measure = sum(abs(calc))),
by = Month],
id.vars = c("Month"))
We melt this data frame along month so that we can create a facet graph. 我们会沿着月份融化此数据框,以便我们可以创建构面图。
If there are no incorrect events, we will get a NaN for Type. 如果没有不正确的事件,我们将获得类型的NaN。 To set this to 0: 要将其设置为0:
tograph[value == NaN, value := 0]
Finally, we can plot 最后,我们可以绘制
ggplot(tograph, aes(x = Month, y = value)) + geom_line() + facet_wrap(~variable, ncol = 1)
We can now see, in one plot: 现在,我们可以在一个图中看到:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.