简体   繁体   English

在 R 中计算特定于类别的变量

[英]Calculating category specific variable in R

I have big data with col_1 as the first category and col_2 as the second category.我有大数据,col_1 为第一类,col_2 为第二类。 I am attaching a sample form(refer to the picture below).我附上一个样本表格(参考下图)。 The data has the first four columns (col_1, col_2, ice, fd).数据有前四列(col_1、col_2、ice、fd)。 I want to generate the variable "ice_new" for each of the categories of col_1 by taking the sum of the column fd as the denominator and the value of "ice" for different col_2 as the numerator and adding them up.我想通过将列 fd 的总和作为分母,将不同 col_2 的“ice”的值作为分子并将它们相加,为 col_1 的每个类别生成变量“ice_new”。 I tried using the "aggregate" function in R, but it doesn't work.我尝试在 R 中使用“聚合”函数,但它不起作用。 How do I execute this in R?我如何在 R 中执行它? Any help will be appreciated任何帮助将不胜感激

col_1   col_2   ice   fd    ice_new

A       A1      0.3   0.1   (0.3/(0.1+0.4) + 0.2/(0.1+0.4)
A       A2      0.2   0.4   (0.3/(0.1+0.4) + 0.2/(0.1+0.4)
B       B1      1.2   1     1.2/(1+2+1.2) + 1.4/(1+2+1.2) + 0.6/ (1+2+1.2)
B       B2      1.4   2     1.2/(1+2+1.2) + 1.4/(1+2+1.2) + 0.6/ (1+2+1.2)
B       B3      0.6   1.2   1.2/(1+2+1.2) + 1.4/(1+2+1.2) + 0.6/ (1+2+1.2)

在此处输入图片说明

One dplyr possibility could be:一种dplyr可能性可能是:

df %>%
 group_by(col_1) %>%
 mutate(ice_new = sum(ice/sum(fd)))

  col_1 col_2   ice    fd ice_new
  <chr> <chr> <dbl> <dbl>   <dbl>
1 A     A1      0.3   0.1   1    
2 A     A2      0.2   0.4   1    
3 B     B1      1.2   1     0.762
4 B     B2      1.4   2     0.762
5 B     B3      0.6   1.2   0.762

Or the same with base R :或与base R相同:

with(df, ave(ice/ave(fd, col_1, FUN = sum), col_1, FUN = sum))
df1 <- data.frame("col_1" = c("A", "A", "B", "B", "B"), 
                        "col_2" = c("A1", "A2", "B1", "B2", "B3"), 
                        "ice" = c(.3,.2,1.2,1.4,.6), 
                        "fd" = c(.1,.4,1,2,1.2))
library(dplyr)
df2 <- df1 %>% 
         group_by(col_1) %>% 
           mutate(ice_new=sum(ice)/sum(fd))

df2
## A tibble: 5 x 5
## Groups:   col_1 [2]
#  col_1 Col_2   ice    fd ice_new
#  <fct> <fct> <dbl> <dbl>  <dbl>
#1 A     A1      0.3   0.1  1    
#2 A     A2      0.2   0.4  1    
#3 B     B1      1.2   1    0.762
#4 B     B2      1.4   2    0.762
#5 B     B3      0.6   1.2  0.762

You could also use summarise to get one value per group:您还可以使用 summary 为每组获取一个值:

library(dplyr)
df %>% 
  group_by(col_1) %>%
  summarise(ice_new = sum(ice / sum(fd)))

# A tibble: 2 x 2
  col1  ice_new
  <chr>   <dbl>
1 A       1    
2 B       0.762

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM