繁体   English   中英

根据r中的列值将两行合并为一

[英]Combine two rows into one based on column value in r

请忽略此部分的外观@在这里开始

我正在尝试合并以下两行:

在此处输入图片说明

像这样排成一排:

在此处输入图片说明

这是用于创建数据集的代码:

dataset <- data.frame(Environment=c("PRODUCTION","PRODUCTION"),
                      Green=c("Yes","No"),
                      Red=c("No","Yes"),
                      Completed=c("Yes","Yes"))

如果“ Environment列具有相同的值,则在这种情况下, PRODUCTION将两行合并并返回“是”。 我没有包含代码,因为我尝试的所有代码均无法正常工作。 例如,以下代码负责重复:

dataset[!duplicated(dataset$Environment),]

任何帮助将不胜感激。

从这里开始-问题更新

我意识到我的问题并不能反映我要解决的问题。 让我再试一遍。 这是数据集:

在此处输入图片说明

我希望它像这样:

在此处输入图片说明

可能还有许多其他列。 然而,所有我想要做的是,如果对同一ID有相同的Environment将它们结合起来,并返回Yes ,如果任何有Yes否则返回默认值。 我希望我的措辞要好得多。

这是新的数据集:

dataset <- data.frame(ID=c(15,15,15,16,16,16,16),Environment=c("PRODUCTION","PRODUCTION", "TRAINING",
                                                               "PRODUCTION","PRODUCTION", "TRAINING", "STAGING"),
                      Green=c("Yes","No", "Yes","Yes","No", "Yes", "Yes"),
                      Red=c("No","Yes", "No","No","Yes", "No", "No"),
                      Completed=c("Yes","Yes", "No","Yes","Yes", "No", "No"))

基于@ P.Routh代码,我认为我们离这一步更近了。 我修改了数据集以显示静态签名将破坏代码:

dataset <- data.frame(ID=c(15,15,15,16,16,16,16),
                      Environment=c("PRODUCTION","PRODUCTION", "TRAINING",
                      "PRODUCTION","PRODUCTION", "TRAINING", "STAGING"),
                      Green=c("Yes","No", "Yes","Yes","No", "No", "Yes"),
                      Red=c("No","Yes", "No","No","Yes", "No", "No"),
                      White=c("No","No", "No","No","No", "No", "No"),
                      Black=c("No","No", "No","No","No", "No", "No"),
                      Completed=c("Yes","Yes", "No","Yes","Yes", "No", "No"))

这样,我想像这样: 在此处输入图片说明

@ P.Routh下面的修改后的代码输出错误:

df <- dataset%>%group_by(ID,Environment)%>%
  mutate(total = n())%>%  #this counter acts as the condition you need
  unite(signature,Green,Red,White,Black,Completed,sep = ":")%>% #combines the columns into one column
  mutate(dummy = "Yes:Yes:Yes:Yes:Yes")%>% #just a dummy column to faciliate in specifying the condition
  mutate(new_val = ifelse(total>1,dummy,signature))%>% #this is the condition
  select(-signature:-dummy)%>%
  separate(new_val, c("Green","Red","White","Black","Completed"),":") #restores original output
unique(df)

尝试使用dplyrzoo

第一种方法

dataset[dataset=='No']=NA  
dataset%>%group_by(Environment)%>%mutate_each(funs(na.locf))%>%filter(row_number()==n())

  Environment  Green    Red Completed
       <fctr> <fctr> <fctr>    <fctr>
1  PRODUCTION    Yes    Yes       Yes

@ eipi10的第二种方法

dataset %>% group_by(Environment) %>% summarise_all(funs(max(as.character(.)))) 

#For the detail 
    #'Yes'>'No'
    #[1] TRUE

    #max('Yes','No')
    #[1] "Yes"

在基数R中,您可以像这样使用aggregate

aggregate(dataset[-1], dataset["Environment"], function(x) max(as.character(x)))

哪个返回

  Environment Green Red Completed
1  PRODUCTION   Yes Yes       Yes

在我回答之后,这个问题似乎已被更改。 但是,对我的原始代码进行很小的改动就可以得到所需的输出(带有一些行重组)

aggregate(dataset[-(1:2)], dataset[c("Environment", "ID")], 
          function(x) max(as.character(x)))

请注意,这假定字符是按顺序排列的,从而使成功按照字典顺序出现在失败之后。 如果情况相反,则可以采用最小值。 其次,在这种情况下,使用数字代码而不是文本更容易。 第二种解决方案是将文本转换为数字以执行上述操作。

使用dplyr的解决方案。 关键是为除Environment之外的所有列指定因子级别。 之后,汇总min的列。 mutate_atsummarise_at可以有效地完成这一任务。

# Load package
library(dplyr)

# Process the data
dataset2 <- dataset %>%
  # Set factor level to all columns except Environment
  mutate_at(vars(-Environment), factor, levels = c("Yes", "No"), ordered = TRUE) %>%
  group_by(Environment) %>%
  summarise_all(funs(min(.)))

我希望还不晚。 我的解决方案使用dplyrtidyr

library(dplyr)
library(tidyr)

df <- dataset%>%group_by(ID,Environment)%>%
mutate(total = n())%>%  #this counter acts as the condition you need
unite(signature,Green,Red,Completed,sep = ":")%>% #combines the columns into one column
mutate(dummy = "Yes:Yes:Yes")%>% #just a dummy column to faciliate in specifying the condition
mutate(new_val = ifelse(total>1,dummy,signature))%>% #this is the condition
select(-signature:-dummy)%>%
separate(new_val, c("Green","Red","Completed"),":") #restores original output
unique(df)

感谢@ P.Routh,@ Wen和@ eipi10。 我采纳了您的所有想法,并提出了可用于大型数据集的有效代码。 这是上面发布的数据集和有效的代码:

#load library
library(dplyr)

#create dataframe
dataset <- data.frame(ID=c(15,15,15,16,16,16,16),
                      Environment=c("PRODUCTION","PRODUCTION", "TRAINING",
                      "PRODUCTION","PRODUCTION", "TRAINING", "STAGING"),
                      Green=c("Yes","No", "Yes","Yes","No", "No", "Yes"),
                      Red=c("No","Yes", "No","No","Yes", "No", "No"),
                      White=c("No","No", "No","No","No", "No", "No"),
                      Black=c("No","No", "No","No","No", "No", "No"),
                      Completed=c("Yes","Yes", "No","Yes","Yes", "No", "No"))


df <- dataset%>%group_by(ID,Environment)%>% mutate(total = n())#add column total for counter of duplicates

ddc<-df[df$total==1,]#subsets those without duplicates
ddd<-df[df$total==2,]#subsets those with duplicates

ddd<- ddd %>% group_by(ID,Environment) %>% summarise_all(funs(max(as.character(.)))) 

merge(ddc, ddd, all=TRUE)

谢谢你们。

感谢@ P.Routh,@ Wen和@ eipi10。 我采纳了您的所有想法,并提出了可用于大型数据集的有效代码。 这是上面发布的数据集和有效的代码:

#load library
library(dplyr)

#create dataframe
dataset <- data.frame(ID=c(15,15,15,16,16,16,16),
                      Environment=c("PRODUCTION","PRODUCTION", "TRAINING",
                      "PRODUCTION","PRODUCTION", "TRAINING", "STAGING"),
                      Green=c("Yes","No", "Yes","Yes","No", "No", "Yes"),
                      Red=c("No","Yes", "No","No","Yes", "No", "No"),
                      White=c("No","No", "No","No","No", "No", "No"),
                      Black=c("No","No", "No","No","No", "No", "No"),
                      Completed=c("Yes","Yes", "No","Yes","Yes", "No", "No"))


df <- dataset%>%group_by(ID,Environment)%>% mutate(total = n())#add column total for counter of duplicates

ddc<-df[df$total==1,]#subsets those without duplicates
ddd<-df[df$total==2,]#subsets those with duplicates

ddd<- ddd %>% group_by(ID,Environment) %>% summarise_all(funs(max(as.character(.)))) 

merge(ddc, ddd, all=TRUE)

谢谢你们。

更新

我对此进行了更多考虑,并意识到我并不需要所有其他步骤来折叠行。 如果提供唯一标识符,则将保留您的数据完整性,例如group_by(ID, Environment) 我走得更远,修改数据集进行测试。 请参阅下面的新解决方案:

dataset <- data.frame(ID=c(15,15,15,15,16,16,16,16),
                      Environment=c("PRODUCTION","PRODUCTION","PRODUCTION", "TRAINING",
                                    "PRODUCTION","PRODUCTION", "TRAINING", "STAGING"),
                      Green=c("Yes","No", "Yes", "Yes","Yes","No", "No", "Yes"),
                      Red=c("No","Yes", "No", "No","No","Yes", "No", "No"),
                      White=c("No","No", "Yes","Yes","No","No", "No", "No"),
                      Black=c("No","No", "No","No","No","No", "No", "No"),
                      Completed=c("Yes","Yes", "No","No","Yes","Yes", "No", "No"))

dataset%>% group_by(ID,Environment) %>% summarise_all(funs(max(as.character(.))))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM