简体   繁体   English

如何将具有不同列名的具有不同列名的数据帧从宽到长转换

[英]How to transform data frame with different column names from wide to long, with different column names

I have a data frame in wide format that I want to transform to long format (melting) so I can process it. 我有一个宽格式的数据框,我想转换为长格式(融化),以便处理它。 The problem is that the "P" columns have different names and the new data frame needs a new "Channel" column so that no information from the header is lost. 问题在于“ P”列具有不同的名称,新数据帧需要一个新的“ Channel”列,因此不会丢失标头中的信息。 Please see image below for a pictorial. 请参阅下面的图片获取图片。

Here is the data frame: 这是数据帧:

df <- read.table(text=
"ID    T    P.1 P.2 P.3
1   24.3    10.2    5.5 2.1
2   23.4    10.4    5.7 2.8
3   22.1    10.5    5.9 3.1
4   19.9    10.2    5.2 2.4
", header=T)

如何将宽格式转换(融化)为长格式

This is a fairly straightforward "wide" to "long" problem. 这是一个相当直接的“宽”到“长”的问题。 Here are three approaches: 这是三种方法:

With "reshape2" 与“ reshape2”

library(reshape2)
melt(df, id.vars = c("ID", "T"), variable.name = "Channel", value.name = "P")
#    ID    T Channel    P
# 1   1 24.3     P.1 10.2
# 2   2 23.4     P.1 10.4
# 3   3 22.1     P.1 10.5
# 4   4 19.9     P.1 10.2
# 5   1 24.3     P.2  5.5
# 6   2 23.4     P.2  5.7
# 7   3 22.1     P.2  5.9
# 8   4 19.9     P.2  5.2
# 9   1 24.3     P.3  2.1
# 10  2 23.4     P.3  2.8
# 11  3 22.1     P.3  3.1
# 12  4 19.9     P.3  2.4

With base R's reshape 随着基地R的reshape

reshape(df, direction = "long", 
        idvar = c("ID", "T"), 
        timevar = "Channel", 
        varying = 3:ncol(df))
#          ID    T Channel    P
# 1.24.3.1  1 24.3       1 10.2
# 2.23.4.1  2 23.4       1 10.4
# 3.22.1.1  3 22.1       1 10.5
# 4.19.9.1  4 19.9       1 10.2
# 1.24.3.2  1 24.3       2  5.5
# 2.23.4.2  2 23.4       2  5.7
# 3.22.1.2  3 22.1       2  5.9
# 4.19.9.2  4 19.9       2  5.2
# 1.24.3.3  1 24.3       3  2.1
# 2.23.4.3  2 23.4       3  2.8
# 3.22.1.3  3 22.1       3  3.1
# 4.19.9.3  4 19.9       3  2.4

With "tidyr" + "dplyr" 使用“ tidyr” +“ dplyr”

library(dplyr)
library(tidyr)

df %>%
  gather(Channel, P, P.1:P.3) %>%
  mutate(Channel = gsub("P.", "", Channel))
#    ID    T Channel    P
# 1   1 24.3       1 10.2
# 2   2 23.4       1 10.4
# 3   3 22.1       1 10.5
# 4   4 19.9       1 10.2
# 5   1 24.3       2  5.5
# 6   2 23.4       2  5.7
# 7   3 22.1       2  5.9
# 8   4 19.9       2  5.2
# 9   1 24.3       3  2.1
# 10  2 23.4       3  2.8
# 11  3 22.1       3  3.1
# 12  4 19.9       3  2.4
reshape(df,direction="long", varying=list(names(df)[3:5]), v.names="Value",idvar=c("ID","T"))
         ID    T time Value
1.24.3.1  1 24.3    1  10.2
2.23.4.1  2 23.4    1  10.4
3.22.1.1  3 22.1    1  10.5
4.19.9.1  4 19.9    1  10.2
1.24.3.2  1 24.3    2   5.5
2.23.4.2  2 23.4    2   5.7
3.22.1.2  3 22.1    2   5.9
4.19.9.2  4 19.9    2   5.2
1.24.3.3  1 24.3    3   2.1
2.23.4.3  2 23.4    3   2.8
3.22.1.3  3 22.1    3   3.1
4.19.9.3  4 19.9    3   2.4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM