[英]R - Character Column loses value when stacking columns
I'm having a strange problem stacking dataframe columns into 3 columns.我在将数据框列堆叠成 3 列时遇到了一个奇怪的问题。 For some reason, the factor column loses its value when it's stacked.出于某种原因,因子列在堆叠时会失去其值。
When I use the below code, in theory, the Treatment values should stack on top of each other, not get replaced by one value.当我使用下面的代码时,理论上,治疗值应该堆叠在一起,而不是被一个值替换。
library(reshape2)
test1<-reshape(df, direction="long", varying=split(names(df), rep(seq_len(ncol(df)/4), 3)))
I won't paste the entire result, but this frequency table should suffice:我不会粘贴整个结果,但是这个频率表应该足够了:
Duplicate column names causing this issue for you.重复的列名会导致您出现此问题。 Better way is to split them and correct for column names and then bind them together using rbind
.更好的方法是拆分它们并纠正列名,然后使用rbind
将它们绑定在一起。 I tried to keep all information by creating two new columns to store the information of q3_...
我试图通过创建两个新列来存储q3_...
的信息来保留所有信息q3_...
do.call('rbind', lapply(seq(3, 12, by = 3), function(x) { y <- df1[,(x-2):x ];
y <- do.call("cbind", list(mo = colnames(y)[1], yr = colnames(y)[2], y ));
colnames(y)[3:4] <- c('mo_val', 'yr_val');
y }))
# mo yr mo_val yr_val Treatment
# 1: q3_1mo q3_1yr NA NA anti-androgen
# 2: q3_1mo q3_1yr 5 2012 anti-androgen
# 3: q3_1mo q3_1yr 4 2008 anti-androgen
# 4: q3_1mo q3_1yr 4 2010 anti-androgen
# 5: q3_1mo q3_1yr NA NA anti-androgen
# 6: q3_1mo q3_1yr 2 2008 anti-androgen
# 7: q3_2mo q3_2yr 8 2010 docetaxel
# 8: q3_2mo q3_2yr 5 2012 docetaxel
# 9: q3_2mo q3_2yr 4 2008 docetaxel
# 10: q3_2mo q3_2yr 4 2010 docetaxel
# 11: q3_2mo q3_2yr 8 2011 docetaxel
# 12: q3_2mo q3_2yr 2 2008 docetaxel
# 13: q3_3mo q3_3yr NA NA abiraterone
# 14: q3_3mo q3_3yr 5 2012 abiraterone
# 15: q3_3mo q3_3yr 4 2008 abiraterone
# 16: q3_3mo q3_3yr 4 2010 abiraterone
# 17: q3_3mo q3_3yr 8 2011 abiraterone
# 18: q3_3mo q3_3yr 2 2008 abiraterone
# 19: q3_3mo q3_3yr NA NA other
# 20: q3_3mo q3_3yr 5 2012 other
# 21: q3_3mo q3_3yr 4 2008 other
# 22: q3_3mo q3_3yr 4 2010 other
# 23: q3_3mo q3_3yr 8 2011 other
# 24: q3_3mo q3_3yr 2 2008 other
# mo yr mo_val yr_val Treatment
Data:数据:
df1 <- structure(list(q3_1mo = c(NA, 5L, 4L, 4L, NA, 2L),
q3_1yr = c(NA, 2012L, 2008L, 2010L, NA, 2008L),
Treatment = c("anti-androgen", "anti-androgen", "anti-androgen", "anti-androgen", "anti-androgen", "anti-androgen"),
q3_2mo = c(8L, 5L, 4L, 4L, 8L, 2L),
q3_2yr = c(2010L, 2012L, 2008L, 2010L, 2011L, 2008L),
Treatment = c("docetaxel", "docetaxel", "docetaxel", "docetaxel", "docetaxel", "docetaxel"),
q3_3mo = c(NA, 5L, 4L, 4L, 8L, 2L),
q3_3yr = c(NA, 2012L, 2008L, 2010L, 2011L, 2008L),
Treatment = c("abiraterone", "abiraterone", "abiraterone", "abiraterone", "abiraterone", "abiraterone"),
q3_3mo = c(NA, 5L, 4L, 4L, 8L, 2L),
q3_3yr = c(NA, 2012L, 2008L, 2010L, 2011L, 2008L),
Treatment = c("other", "other", "other", "other", "other", "other")),
.Names = c("q3_1mo", "q3_1yr", "Treatment", "q3_2mo", "q3_2yr", "Treatment", "q3_3mo", "q3_3yr", "Treatment", "q3_3mo", "q3_3yr", "Treatment"),
row.names = c(NA, -6L), class = "data.frame")
You can also fix this and use the same code by giving unique names to your variables with make.unique
.您还可以通过使用make.unique
为变量指定唯一名称来解决此问题并使用相同的代码。
names(df) <- make.unique(names(df))
test1 <- reshape(df, direction="long",
varying=split(names(df), rep(seq_len(ncol(df)/4), 3)))
This returns这返回
test1测试1
time q3_1mo q3_1yr Treatment id
1.1 1 NA NA anti-androgen 1
2.1 1 5 2012 anti-androgen 2
3.1 1 4 2008 anti-androgen 3
4.1 1 4 2010 anti-androgen 4
5.1 1 NA NA anti-androgen 5
6.1 1 2 2008 anti-androgen 6
1.2 2 8 2010 docetaxel 1
2.2 2 5 2012 docetaxel 2
3.2 2 4 2008 docetaxel 3
4.2 2 4 2010 docetaxel 4
5.2 2 8 2011 docetaxel 5
6.2 2 2 2008 docetaxel 6
1.3 3 NA NA abiraterone 1
2.3 3 5 2012 abiraterone 2
3.3 3 4 2008 abiraterone 3
4.3 3 4 2010 abiraterone 4
5.3 3 8 2011 abiraterone 5
6.3 3 2 2008 abiraterone 6
1.4 4 NA NA other 1
2.4 4 5 2012 other 2
3.4 4 4 2008 other 3
4.4 4 4 2010 other 4
5.4 4 8 2011 other 5
6.4 4 2 2008 other 6
You would have to spend a couple of lines cleaning up the names and maybe removing some columns, but your code would go through.您将不得不花费几行来清理名称并可能删除一些列,但您的代码会通过。 Also, note that reshape
is a base R function, so that loading reshape2
is unecessary.另外,请注意reshape
是一个基本的 R 函数,因此加载reshape2
是不必要的。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.