简体   繁体   English

根据其他 dataframe 按行和列分组 dataframe?

[英]Group dataframe row and column wise based on other dataframe?

I have a dataframe that I would like to group in both directions, first rowise and columnwise after.我有一个 dataframe 我想在两个方向上进行分组,首先是按行排列,然后是按列排列。 The first part worked well, but I am stuck with the second one.第一部分效果很好,但我坚持使用第二部分。 I would appreciate any help or advice for a solution that does both steps at the same time.对于同时执行这两个步骤的解决方案,我将不胜感激。

This is the dataframe:这是 dataframe:

df1 <- data.frame(
  ID = c(rep(1,5),rep(2,5)),
  ID2 = rep(c("A","B","C","D","E"),2),
  A = rnorm(10,20,1),
  B = rnorm(10,50,1),
  C = rnorm(10,10,1),
  D = rnorm(10,15,1),
  E = rnorm(10,5,1)
)

This is the second dataframe, which holds the "recipe" for grouping:这是第二个 dataframe,它包含分组的“配方”:

df2 <- data.frame (
  Group_1 = c("B","C"),
  Group_2 = c("D","A"),
  Group_3 = ("E"), stringsAsFactors = FALSE)

Rowise grouping: Rowise 分组:

df1_grouped<-bind_cols(df1[1:2], map_df(df2, ~rowSums(df1[unique(.x)]))) 

Now i would like to apply the same grouping to the ID2 column and sum the values in the other columns.现在我想对 ID2 列应用相同的分组并对其他列中的值求和。 My idea was to mutate a another column (eg "group", which contains the name of the final group of ID2. After this i can use group_by() and summarise() to calculate the sum for each. However, I can't figure out an automated way to do it我的想法是改变另一列(例如“组”,其中包含 ID2 的最终组的名称。在此之后,我可以使用 group_by() 和 summarise() 来计算每个列的总和。但是,我不能找出一种自动化的方法来做到这一点

bind_cols(df1_grouped,

    #add group label
    data.frame(
    group = rep(c("Group_2","Group_1","Group_1","Group_2","Group_3"),2))) %>%

    #remove temporary label column and make ID a character column
    mutate(ID2=group,
           ID=as.character(ID))%>%
    select(-group) %>%

    #summarise
    group_by(ID,ID2)%>%
    summarise_if(is.numeric, sum, na.rm = TRUE)

This is the final table I need, but I had to manually assign the groups, which is impossible for big datasets这是我需要的最终表,但我必须手动分配组,这对于大数据集是不可能的

I will offer such a solution我会提供这样的解决方案

library(tidyverse)
set.seed(1)
df1 <- data.frame(
  ID = c(rep(1,5),rep(2,5)),
  ID2 = rep(c("A","B","C","D","E"),2),
  A = rnorm(10,20,1),
  B = rnorm(10,50,1),
  C = rnorm(10,10,1),
  D = rnorm(10,15,1),
  E = rnorm(10,5,1)
)

df2 <- data.frame (
  Group_1 = c("B","C"),
  Group_2 = c("D","A"),
  Group_3 = ("E"), stringsAsFactors = FALSE) 

df2 <- df2 %>% pivot_longer(everything())

df1 %>% 
  pivot_longer(-c(ID, ID2)) %>% 
  mutate(gr_r = df2$name[match(ID2, table = df2$value)],
         gr_c = df2$name[match(name, table = df2$value)]) %>% 
  arrange(ID, gr_r, gr_c) %>% 
  pivot_wider(c(ID, gr_r), names_from = gr_c, values_from = value, values_fn = list(value = sum))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据其他 dataframe 列值更新每个组的 dataframe 的最后一行 - Update last row of dataframe for each group based on other dataframe column value 根据 R 中的逐行列相似性修剪 dataframe - Prune a dataframe based on row-wise column similarity in R 有没有一种方法可以使用foreach循环基于一个数据帧中的其他数字列按行向列分配值? - Is there a way to use foreach loop to assign values row-wise to a column based on other numeric column within one dataframe? 根据组中的其他变量为 dataframe 中的每个组创建汇总行 - Creating a summary row for each group in a dataframe based on other variables in the group 根据列值分组,然后将该组作为一行添加到 r 中的 dataframe - Group by based on a column value and then add the group as a row to a dataframe in r 如何根据R数据帧中特定列的条件获取逐行最大值? - How can I get row-wise max based on condition of specific column in R dataframe? R 中 dataframe 的逐行比较 - Row wise comparison of a dataframe in R 为 R 中的每个组更新 dataframe 的行与其他 dataframe 列值 - Updating rows of dataframe with other dataframe column vlaue for each group in R 数据框的逐列过滤 - column-wise filtering of dataframe 根据列中的序列中断对数据帧进行分组? - Group a dataframe based on sequence breaks in a column?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM