简体   繁体   English

跨多个值聚合行

[英]Aggregating rows across multiple values

I have a large dataframe with approximately this pattern:我有一个大的 dataframe 大约有这种模式:

Person Rate速度 Street街道 a一个 b b c c d d e e f F
A一个 2 2 XYZ XYZ 1 1 NULL NULL 3 3 4 4 5 5 NULL NULL
A一个 2 2 XYZ XYZ NULL NULL 2 2 NULL NULL NULL NULL NULL NULL NULL NULL
A一个 3 3 XYZ XYZ NULL NULL NULL NULL NULL NULL NULL NULL NULL NULL 6 6
B 2 2 DEF国防军 NULL NULL NULL NULL NULL NULL NULL NULL 5 5 NULL NULL
B 2 2 DEF国防军 NULL NULL 2 2 3 3 NULL NULL NULL NULL 6 6
C C 1 1 DEF国防军 1 1 2 2 3 3 4 4 5 5 6 6

A, b, c, d, e, f represents about 600 columns. A、b、c、d、e、f代表约600列。

I am trying to combine the columns so that each person becomes one line, rows af combine into a single line using sum, and any conflicting rate or street information becomes a new row.我正在尝试合并列,以便每个人成为一行,使用 sum 将行合并为一行,并且任何冲突的比率或街道信息都成为新行。 So the data should look something like this:所以数据应该是这样的:

Person Rate速度 Rate 2率 2 Street街道 a一个 b b c c d d e e f F
A一个 2 2 3 3 XYZ XYZ 1 1 2 2 3 3 4 4 5 5 6 6
B 2 2 DEF国防军 NULL NULL 2 2 3 3 NULL NULL 5 5 6 6
C C 1 1 DEF国防军 1 1 2 2 3 3 4 4 5 5 6 6

I keep trying to make this work with aggregate and summarize but I'm not sure that's the right approach.我一直在尝试通过汇总和汇总来完成这项工作,但我不确定这是正确的方法。

Thank you very much for your help!非常感谢您的帮助!

First we pivot all the unique rates per person and street.首先我们 pivot 每个人和街道的所有唯一价格。

library(reshape2)
tmp1=dcast(unique(df[,c("Person","Rate","Street")]),Person+Street~Rate,value.var="Rate")
colnames(tmp1)[-c(1:2)]=paste("Rate",colnames(tmp1)[-c(1:2)])

Then we aggregate and sum by person and rate, columns 4 to 9, from "a" to "f", change accordingly.然后我们按人和比率汇总和求和,第 4 到第 9 列,从“a”到“f”,相应地改变。

tmp2=aggregate(df[,4:9],list(Person=df$Person,Street=df$Street),function(x){
  ifelse(all(is.na(x)),NA,sum(x,na.rm=T))
})

And finally merge the two.最后将两者合并。

merge(tmp1,tmp2,by=c("Person","Street"))
  Person Street Rate 1 Rate 2 Rate 3  a b c  d e f
1      A    XYZ     NA      2      3  1 2 3  4 5 6
2      B    DEF     NA      2     NA NA 2 3 NA 5 6
3      C    DEF      1     NA     NA  1 2 3  4 5 6

Perhaps, you can do this in two-step process -也许,您可以分两步执行此操作 -

library(dplyr)
library(tidyr)

#sum columns a-f
table1 <- df %>%
  group_by(Person) %>%
  summarise(across(a:f, sum, na.rm = TRUE))


#Remove duplicated values and get the data in separate columns
#for Rate and Street columns.
table2 <- df %>%
  group_by(Person) %>%
  mutate(across(c(Rate, Street), ~replace(., duplicated(.), NA))) %>%
  select(Person, Rate, Street) %>%
  filter(if_any(c(Rate, Street), ~!is.na(.))) %>%
  mutate(col = row_number()) %>%
  ungroup %>%
  pivot_wider(names_from = col, values_from = c(Rate, Street)) %>%
  select(where(~any(!is.na(.))))

#Join the two data to get final result
inner_join(table1, table2, by = 'Person')

# Person     a     b     c     d     e     f Rate_1 Rate_2 Street_1
#  <chr>  <int> <int> <int> <int> <int> <int>  <int>  <int> <chr>   
#1 A          1     2     3     4     5     6      2      3 XYZ     
#2 B          0     2     3     0     5     6      2     NA DEF     
#3 C          1     2     3     4     5     6      1     NA DEF     

data数据

It is helpful and easier to help when you share data in a reproducible format which can be copied directly.当您以可直接复制的可复制格式共享数据时,这将很有帮助且更容易提供帮助。 I have used the below data for the answer.我已使用以下数据来回答。

df <- structure(list(Person = c("A", "A", "A", "B", "B", "C"), Rate = c(2L, 
2L, 3L, 2L, 2L, 1L), Street = c("XYZ", "XYZ", "XYZ", "DEF", "DEF", 
"DEF"), a = c(1L, NA, NA, NA, NA, 1L), b = c(NA, 2L, NA, NA, 
2L, 2L), c = c(3L, NA, NA, NA, 3L, 3L), d = c(4L, NA, NA, NA, 
NA, 4L), e = c(5L, NA, NA, 5L, NA, 5L), f = c(NA, NA, 6L, NA, 
6L, 6L)), row.names = c(NA, -6L), class = "data.frame")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM