简体   繁体   English

如何将这种长格式数据帧转换为宽格式?

[英]How can I convert this long format dataframe into a wide format?

I am using RStudio for data analysis in R . 我正在使用RStudioR进行数据分析。 I currently have a dataframe which is in a long format . 我目前有一个long formatdataframe I want to convert it into the wide format . 我想将其转换为wide format

An extract of the dataframe ( df1 ) is shown below. dataframedf1 )的提取如下所示。 I have converted the first column into a factor . 我已经将第一列转换为一个factor

Extract: 提取:

df1 <- read.csv("test1.csv", stringsAsFactors = FALSE, header = TRUE)

df1$Respondent <- factor(df1$Respondent)

df1

      Respondent  Question      CS             Imp     LOS  Type  Hotel
1          1       Q1       Fully Applied     High     12   SML   ABC
2          1       Q2       Optimized         Critical 12   SML   ABC

I want a new dataframe (say, df2 ) to look like this: 我想要一个新的dataframe (例如df2 )看起来像这样:

Respondent      Q1CS           Q1Imp     Q2CS        Q2Imp     LOS   Type   Hotel
  1          Fully Applied      High    Optimized    Critical   12   SML    ABC

How can I do this in R ? 我如何在R做到这一点?

Additional notes: I have tried looking at the tidyr package and its spread() function but I am having a hard time implementing it to this specific problem. 附加说明:我曾尝试查看tidyr程序包及其spread()函数,但是很难解决这个特定问题。

This can be achieved with a gather - unite - spread approach 这可以通过gather - unite - spread方法来实现

df %>%
    group_by(Respondent) %>%
    gather(k, v, CS, Imp) %>%
    unite(col, Question, k, sep = "") %>%
    spread(col, v)
#  Respondent LOS Type Hotel          Q1CS Q1Imp      Q2CS    Q2Imp
#1          1  12  SML   ABC Fully Applied  High Optimized Critical

Sample data 样本数据

df <- read.table(text =
    "      Respondent  Question      CS             Imp     LOS  Type  Hotel
1          1       Q1       'Fully Applied'     High     12   SML   ABC
2          1       Q2       'Optimized'         Critical 12   SML   ABC", header = T)

In data.table, this can be done in a one-liner.... 在data.table中,这可以单线完成。

dcast(DT, Respondent ~ Question, value.var = c("CS", "Imp"), sep = "")[DT, `:=`(LOS = i.LOS, Type = i.Type, Hotel = i.Hotel), on = "Respondent"][]
  Respondent CSQ1 CSQ2 ImpQ1 ImpQ2 LOS Type Hotel 1: 1 Fully Applied Optimized High Critical 12 SML ABC 

explained step by step 逐步说明

create sample data 创建样本数据

DT <- fread("Respondent  Question      CS             Imp     LOS  Type  Hotel
             1  Q1       'Fully Applied'     High     12   SML   ABC
            1   Q2       'Optimized'         Critical 12   SML   ABC", quote = '\'')

Cast a part of the datatable to desired format by question 通过提问将数据表的一部分转换为所需格式
colnames might not be what you want... you can always change them using setnames() . colnames可能不是您想要的...您可以始终使用setnames()更改它们。

dcast(DT, Respondent ~ Question, value.var = c("CS", "Imp"), sep = "")
#    Respondent          CSQ1      CSQ2 ImpQ1    ImpQ2
# 1:          1 Fully Applied Optimized  High Critical

Then join by reference on the orikginal DT, to get the rest of the columns you need... 然后在原始DT上通过引用加入,以获取您需要的其余列...

result.from.dcast[DT, `:=`( LOS = i.LOS, Type = i.Type, Hotel = i.Hotel), on = "Respondent"]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM