跨多个值聚合行

Question

I have a large dataframe with approximately this pattern:我有一个大的 dataframe 大约有这种模式：

Person人	Rate速度	Street街道	a一个	b b	c c	d d	e e	f F
A一个	2 2	XYZ XYZ	1 1	NULL NULL	3 3	4 4	5 5	NULL NULL
A一个	2 2	XYZ XYZ	NULL NULL	2 2	NULL NULL	NULL NULL	NULL NULL	NULL NULL
A一个	3 3	XYZ XYZ	NULL NULL	NULL NULL	NULL NULL	NULL NULL	NULL NULL	6 6
B乙	2 2	DEF国防军	NULL NULL	NULL NULL	NULL NULL	NULL NULL	5 5	NULL NULL
B乙	2 2	DEF国防军	NULL NULL	2 2	3 3	NULL NULL	NULL NULL	6 6
C C	1 1	DEF国防军	1 1	2 2	3 3	4 4	5 5	6 6

A, b, c, d, e, f represents about 600 columns. A、b、c、d、e、f代表约600列。

I am trying to combine the columns so that each person becomes one line, rows af combine into a single line using sum, and any conflicting rate or street information becomes a new row.我正在尝试合并列，以便每个人成为一行，使用 sum 将行合并为一行，并且任何冲突的比率或街道信息都成为新行。 So the data should look something like this:所以数据应该是这样的：

Person人	Rate速度	Rate 2率 2	Street街道	a一个	b b	c c	d d	e e	f F
A一个	2 2	3 3	XYZ XYZ	1 1	2 2	3 3	4 4	5 5	6 6
B乙	2 2		DEF国防军	NULL NULL	2 2	3 3	NULL NULL	5 5	6 6
C C	1 1		DEF国防军	1 1	2 2	3 3	4 4	5 5	6 6

I keep trying to make this work with aggregate and summarize but I'm not sure that's the right approach.我一直在尝试通过汇总和汇总来完成这项工作，但我不确定这是正确的方法。

Thank you very much for your help!非常感谢您的帮助！

Answer 1

First we pivot all the unique rates per person and street.首先我们 pivot 每个人和街道的所有唯一价格。

library(reshape2)
tmp1=dcast(unique(df[,c("Person","Rate","Street")]),Person+Street~Rate,value.var="Rate")
colnames(tmp1)[-c(1:2)]=paste("Rate",colnames(tmp1)[-c(1:2)])

Then we aggregate and sum by person and rate, columns 4 to 9, from "a" to "f", change accordingly.然后我们按人和比率汇总和求和，第 4 到第 9 列，从“a”到“f”，相应地改变。

tmp2=aggregate(df[,4:9],list(Person=df$Person,Street=df$Street),function(x){
  ifelse(all(is.na(x)),NA,sum(x,na.rm=T))
})

And finally merge the two.最后将两者合并。

merge(tmp1,tmp2,by=c("Person","Street"))
  Person Street Rate 1 Rate 2 Rate 3  a b c  d e f
1      A    XYZ     NA      2      3  1 2 3  4 5 6
2      B    DEF     NA      2     NA NA 2 3 NA 5 6
3      C    DEF      1     NA     NA  1 2 3  4 5 6

Answer 2

Perhaps, you can do this in two-step process -也许，您可以分两步执行此操作 -

library(dplyr)
library(tidyr)

#sum columns a-f
table1 <- df %>%
  group_by(Person) %>%
  summarise(across(a:f, sum, na.rm = TRUE))


#Remove duplicated values and get the data in separate columns
#for Rate and Street columns.
table2 <- df %>%
  group_by(Person) %>%
  mutate(across(c(Rate, Street), ~replace(., duplicated(.), NA))) %>%
  select(Person, Rate, Street) %>%
  filter(if_any(c(Rate, Street), ~!is.na(.))) %>%
  mutate(col = row_number()) %>%
  ungroup %>%
  pivot_wider(names_from = col, values_from = c(Rate, Street)) %>%
  select(where(~any(!is.na(.))))

#Join the two data to get final result
inner_join(table1, table2, by = 'Person')

# Person     a     b     c     d     e     f Rate_1 Rate_2 Street_1
#  <chr>  <int> <int> <int> <int> <int> <int>  <int>  <int> <chr>   
#1 A          1     2     3     4     5     6      2      3 XYZ     
#2 B          0     2     3     0     5     6      2     NA DEF     
#3 C          1     2     3     4     5     6      1     NA DEF

data数据

It is helpful and easier to help when you share data in a reproducible format which can be copied directly.当您以可直接复制的可复制格式共享数据时，这将很有帮助且更容易提供帮助。 I have used the below data for the answer.我已使用以下数据来回答。

df <- structure(list(Person = c("A", "A", "A", "B", "B", "C"), Rate = c(2L, 
2L, 3L, 2L, 2L, 1L), Street = c("XYZ", "XYZ", "XYZ", "DEF", "DEF", 
"DEF"), a = c(1L, NA, NA, NA, NA, 1L), b = c(NA, 2L, NA, NA, 
2L, 2L), c = c(3L, NA, NA, NA, 3L, 3L), d = c(4L, NA, NA, NA, 
NA, 4L), e = c(5L, NA, NA, 5L, NA, 5L), f = c(NA, NA, 6L, NA, 
6L, 6L)), row.names = c(NA, -6L), class = "data.frame")

跨多个值聚合行

问题描述

2 个解决方案

解决方案1
1 2021-06-11 06:53:24

解决方案2
0 2021-06-11 06:31:04

跨多个值聚合行

问题描述

2 个解决方案

解决方案1 1 2021-06-11 06:53:24

解决方案2 0 2021-06-11 06:31:04

解决方案1
1 2021-06-11 06:53:24

解决方案2
0 2021-06-11 06:31:04