简体   繁体   English

R - 字符列在堆叠列时失去价值

[英]R - Character Column loses value when stacking columns

I'm having a strange problem stacking dataframe columns into 3 columns.我在将数据框列堆叠成 3 列时遇到了一个奇怪的问题。 For some reason, the factor column loses its value when it's stacked.出于某种原因,因子列在堆叠时会失去其值。

When I use the below code, in theory, the Treatment values should stack on top of each other, not get replaced by one value.当我使用下面的代码时,理论上,治疗值应该堆叠在一起,而不是被一个值替换。

library(reshape2)
test1<-reshape(df, direction="long", varying=split(names(df), rep(seq_len(ncol(df)/4), 3)))

I won't paste the entire result, but this frequency table should suffice:我不会粘贴整个结果,但是这个频率表应该足够了:

Duplicate column names causing this issue for you.重复的列名会导致您出现此问题。 Better way is to split them and correct for column names and then bind them together using rbind .更好的方法是拆分它们并纠正列名,然后使用rbind将它们绑定在一起。 I tried to keep all information by creating two new columns to store the information of q3_...我试图通过创建两个新列来存储q3_...的信息来保留所有信息q3_...

do.call('rbind', lapply(seq(3, 12, by = 3), function(x) { y <- df1[,(x-2):x ]; 
                                                          y <- do.call("cbind", list(mo = colnames(y)[1], yr = colnames(y)[2], y ));
                                                          colnames(y)[3:4] <- c('mo_val', 'yr_val');
                                                          y }))

#         mo     yr mo_val yr_val     Treatment
# 1:  q3_1mo q3_1yr     NA     NA anti-androgen
# 2:  q3_1mo q3_1yr      5   2012 anti-androgen
# 3:  q3_1mo q3_1yr      4   2008 anti-androgen
# 4:  q3_1mo q3_1yr      4   2010 anti-androgen
# 5:  q3_1mo q3_1yr     NA     NA anti-androgen
# 6:  q3_1mo q3_1yr      2   2008 anti-androgen
# 7:  q3_2mo q3_2yr      8   2010     docetaxel
# 8:  q3_2mo q3_2yr      5   2012     docetaxel
# 9:  q3_2mo q3_2yr      4   2008     docetaxel
# 10: q3_2mo q3_2yr      4   2010     docetaxel
# 11: q3_2mo q3_2yr      8   2011     docetaxel
# 12: q3_2mo q3_2yr      2   2008     docetaxel
# 13: q3_3mo q3_3yr     NA     NA   abiraterone
# 14: q3_3mo q3_3yr      5   2012   abiraterone
# 15: q3_3mo q3_3yr      4   2008   abiraterone
# 16: q3_3mo q3_3yr      4   2010   abiraterone
# 17: q3_3mo q3_3yr      8   2011   abiraterone
# 18: q3_3mo q3_3yr      2   2008   abiraterone
# 19: q3_3mo q3_3yr     NA     NA         other
# 20: q3_3mo q3_3yr      5   2012         other
# 21: q3_3mo q3_3yr      4   2008         other
# 22: q3_3mo q3_3yr      4   2010         other
# 23: q3_3mo q3_3yr      8   2011         other
# 24: q3_3mo q3_3yr      2   2008         other
#         mo     yr mo_val yr_val     Treatment

Data:数据:

df1 <- structure(list(q3_1mo = c(NA, 5L, 4L, 4L, NA, 2L), 
                      q3_1yr = c(NA, 2012L, 2008L, 2010L, NA, 2008L),
                      Treatment = c("anti-androgen", "anti-androgen", "anti-androgen", "anti-androgen", "anti-androgen", "anti-androgen"),
                      q3_2mo = c(8L, 5L, 4L, 4L, 8L, 2L), 
                      q3_2yr = c(2010L, 2012L, 2008L, 2010L, 2011L, 2008L),
                      Treatment = c("docetaxel", "docetaxel", "docetaxel", "docetaxel", "docetaxel", "docetaxel"),
                      q3_3mo = c(NA, 5L, 4L, 4L, 8L, 2L),
                      q3_3yr = c(NA, 2012L, 2008L, 2010L, 2011L, 2008L), 
                      Treatment = c("abiraterone", "abiraterone", "abiraterone", "abiraterone", "abiraterone", "abiraterone"), 
                      q3_3mo = c(NA, 5L, 4L, 4L, 8L, 2L), 
                      q3_3yr = c(NA, 2012L, 2008L, 2010L, 2011L, 2008L),
                      Treatment = c("other", "other", "other", "other", "other", "other")), 
                 .Names = c("q3_1mo", "q3_1yr", "Treatment", "q3_2mo", "q3_2yr", "Treatment", "q3_3mo", "q3_3yr", "Treatment", "q3_3mo", "q3_3yr", "Treatment"), 
                 row.names = c(NA, -6L), class = "data.frame")

You can also fix this and use the same code by giving unique names to your variables with make.unique .您还可以通过使用make.unique为变量指定唯一名称来解决此问题并使用相同的代码。

names(df) <- make.unique(names(df))
test1 <- reshape(df, direction="long",
                 varying=split(names(df), rep(seq_len(ncol(df)/4), 3)))

This returns这返回

test1测试1

    time q3_1mo q3_1yr     Treatment id
1.1    1     NA     NA anti-androgen  1
2.1    1      5   2012 anti-androgen  2
3.1    1      4   2008 anti-androgen  3
4.1    1      4   2010 anti-androgen  4
5.1    1     NA     NA anti-androgen  5
6.1    1      2   2008 anti-androgen  6
1.2    2      8   2010     docetaxel  1
2.2    2      5   2012     docetaxel  2
3.2    2      4   2008     docetaxel  3
4.2    2      4   2010     docetaxel  4
5.2    2      8   2011     docetaxel  5
6.2    2      2   2008     docetaxel  6
1.3    3     NA     NA   abiraterone  1
2.3    3      5   2012   abiraterone  2
3.3    3      4   2008   abiraterone  3
4.3    3      4   2010   abiraterone  4
5.3    3      8   2011   abiraterone  5
6.3    3      2   2008   abiraterone  6
1.4    4     NA     NA         other  1
2.4    4      5   2012         other  2
3.4    4      4   2008         other  3
4.4    4      4   2010         other  4
5.4    4      8   2011         other  5
6.4    4      2   2008         other  6

You would have to spend a couple of lines cleaning up the names and maybe removing some columns, but your code would go through.您将不得不花费几行来清理名称并可能删除一些列,但您的代码会通过。 Also, note that reshape is a base R function, so that loading reshape2 is unecessary.另外,请注意reshape是一个基本的 R 函数,因此加载reshape2是不必要的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM