简体   繁体   English

如何使用dplyr聚合数据框的多个列

[英]how to aggregate multiple columns of a dataframe with dplyr

A dataframe with a column id, a column category, a column cost and a column colour. 具有列ID,列类别,列成本和列颜色的数据框。

here is the dataframe df 这是数据帧df

library(dplyr)

id <- c(1, 1, 1, 2, 2, 3, 1) 
category <- (c("V", "V", "V", "W", "W", "W", "W"))
cost <- c(10, 15, 5, 2, 14, 20, 3)
colour <- c("red", "green", "red", "green", "blue","blue","blue")

df <- data.frame(id, category, cost, colour)
df$category <- as.character(df$category)

df
id    category    cost     colour
1     V           10       red
1     V           15       green
1     V           5        red
2     W           2        green
2     W           14       blue
3     W           20       blue
1     W           3        blue

here is the format of the df 这是df的格式

'data.frame':   7 obs. of  4 variables:
 $ id       : num  1 1 1 2 2 3 1
 $ category : chr  "V" "V" "V" "W" ...
 $ cost: num  10 15 5 2 14 20 3
 $ colour   : Factor w/ 3 levels "blue","green",..: 3 2 3 2 1 1 1

I would like to have a new dataframe df_new and for each id the frequency (freq), the number of category entries where the entry is equal W (category_W), the number of category entries where the entry is equal V (category_V), the total cost of each id where the category entry is W (cost_W), the total cost of each id where the category entry is V (cost_V) and for each unique id the number of each colour entry (col_red, col_green, col_blue). 我想有一个新的数据框df_new,对于每个id,其频率(freq),条目等于W的类别条目的数量(category_W),条目等于V的类别条目的数量(category_V),类别条目为W(cost_W)的每个id的总成本,类别条目为V(cost_V)的每个id的总成本,对于每个唯一ID,每个颜色条目的数量(col_red,col_green,col_blue)。 The output should look like 输出应该看起来像

id freq category_W    category_V    cost_W  cost_V    col_red  col_green col_blue
1  4      1             3             3       30        2           1       1     
2  2      2                          16                             1       1 
3  1      1                          20                                     1

I tried the following - but it doesn't work. 我尝试了以下方法-但不起作用。

df_new <- group_by(df, id) %>% summarize(freq = count(id), category_W = count(category == "W", na.rm=TRUE), category_V = count(category == "V", na.rm=TRUE), col_red = count(colour == "red", na.rm=TRUE), col_green = count(colour == "green", na.rm=TRUE),  col_blue = count(colour == "blue", na.rm=TRUE))    

I have no clue how i can insert the condition for cost_W and cost_V. 我不知道如何为cost_W和cost_V插入条件。 I get the error: length(rows) == 1 is not TRUE Thanks a lot in advance! 我收到错误消息:length(rows)== 1不是TRUE非常感谢!

Well, you are almost there. 好吧,你快到了。

You can take advantage of the fact that logical values are converted into 0 and 1 in arithmetical operations. 您可以利用以下事实:在算术运算中逻辑值被转换为0和1。 So when you sum them you get the count of specific values which the logical clause tested for. 因此,当您对它们求和时,将得到逻辑子句测试的特定值的计数。

You can use the same property to calculate the cost. 您可以使用相同的属性来计算成本。 Just multiply the logical clause with the cost variable. 只需将逻辑子句与cost变量相乘即可。 If the category matches your interest it is summed, otherwise, it is reduced to 0 如果类别符合您的兴趣,则将其相加,否则将其减少为0

df_new <-
    group_by(df, id) %>% summarize(
      freq = n(),
      category_W = sum(category == "W", na.rm = TRUE),
      category_V = sum(category == "V", na.rm = TRUE),
      cost_W = sum((category == "W") * cost, na.rm = TRUE),
      cost_V = sum((category == "V") * cost, na.rm = TRUE),
      col_red = sum(colour == "red", na.rm = TRUE),
      col_green = sum(colour == "green", na.rm = TRUE),
      col_blue = sum(colour == "blue", na.rm = TRUE)
  )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM