简体   繁体   English

按唯一列值分隔数据

[英]Separating data by unique column values

Edit to make the problem more clear: I start with a messy CSV file so I need to identify what is the ID and what is the time variable. 编辑以使问题更清楚:我从一个混乱的CSV文件开始,因此我需要确定ID是什么,时间变量是什么。 Or assign ID and time to the data in the variable columns. 或为变量列中的数据分配ID和时间。 This question has now been answered below. 现在已经在下面回答了这个问题。 Here is my data: 这是我的数据:

col1<-c("ID", "Date","var1","var2","ID","Date","var1","var2","ID","Date","var1","var2")
col2<-c("1","21-11-2015 14:20", "4.8","3.8", "1","21-11-2015 15:30", "3.5","5.9","2","21-11-2015 14:20","3.0","6.7")
df<-cbind(col1,col2)

I tried with dcast() with no luck: 我没有运气就尝试了dcast()

dcast(ID+Date~var1+var2, data = df, value.var = col1 )

I would like the output to be a true long format like this: 我希望输出是像这样的真正的长格式:

ID<-c(1,1,2)
Date<-c("21-11-2015 14:20","21-11-2015 15:30","21-11-2015 14:20")
var1<-c("4.8","3.5","6.7")
var2<-c("3.8","5.9","3.0")
df.clean<-cbind(ID,Date, var1,var2)

I appreciate your help. 我感谢您的帮助。

Don't think this is a reshape question, you have values in one column and names in other which can be gathered together and given names using setNames 不要以为这是一个重塑问题,您可以在一个列中包含值,而在另一列中包含名称,可以使用setNames将它们收集在一起并指定名称

with(df, setNames(data.frame(matrix(col2,
          ncol = length(unique(col1)), byrow = TRUE)), unique(col1)))

#  ID             Date var1 var2
#1  1 21-11-2015 14:20  4.8  3.8
#2  1 21-11-2015 15:30  3.5  5.9
#3  2 21-11-2015 14:20  3.0  6.7

data 数据

col1<-c("ID", "Date","var1","var2","ID","Date","var1","var2","ID",
        "Date","var1","var2")
col2<-c("1","21-11-2015 14:20", "4.8","3.8", "1","21-11-2015 15:30", 
         "3.5","5.9","2","21-11-2015 14:20","3.0","6.7")
df<- data.frame(col1,col2)

This is not a reshape question. 这不是重塑问题。 Here I supply a simple code on how to do it manually: 在这里,我提供了有关如何手动执行的简单代码:

Data 数据

col1<-c("ID", 
        "Date","var1","var2","ID","Date","var1","var2","ID","Date","var1","var2")
col2<-c("1","21-11-2015 14:20", "4.8","3.8", "1","21-11-2015 15:30", 
        "3.5","5.9","2","21-11-2015 14:20","3.0","6.7")
df<-data.frame(col1,col2, stringsAsFactors = F)

Code

uniquevars<-unique(col1)
Res<-list()
for(i in 1:length(uniquevars)){
  Res[[uniquevars[i]]]<-df[,"col2"][which(df[,"col1"] ==uniquevars[i])]
}

dfRes <- data.frame(matrix(unlist(Res), ncol=length(Res)),stringsAsFactors=FALSE)
colnames(dfRes)<-uniquevars
dfRes
      ID             Date var1 var2
    1  1 21-11-2015 14:20  4.8  3.8
    2  1 21-11-2015 15:30  3.5  5.9
    3  2 21-11-2015 14:20  3.0  6.7

I hope this code makes you understand the steps to follow on what you are interested in doing. 我希望这段代码能使您理解对您感兴趣的步骤。

Cheers ! 干杯!

Here's a tidyverse approach: 这是一个tidyverse方法:

library(tidyverse)

df %>%                                # your original (cbind) object
  data.frame() %>%                    # set as dataframe
  group_by(col1) %>%                  # for each col1 value
  mutate(index = row_number()) %>%    # set a row index (useful for reshaping)
  spread(col1, col2) %>%              # reshape
  select(-index)                      # remove index

# # A tibble: 3 x 4
#   Date             ID    var1  var2 
#   <fct>            <fct> <fct> <fct>
# 1 21-11-2015 14:20 1     4.8   3.8  
# 2 21-11-2015 15:30 1     3.5   5.9  
# 3 21-11-2015 14:20 2     3.0   6.7 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM