简体   繁体   中英

How to sum values of one column, based on two conditions, grouped by another column value, in R?

I have a dataset that includes many "transects", and a multiple "transects" comprise a "plane" (eg Plane P1 = Transect T1 + Transect T2)

The current data structure (see example below) has the length of each transect repeated in the column "transect length" for each sample that was collected in the transect. I need to sum the transect lengths to get the "plane length" for EACH plane.

Here is what the data structure I've been given looks like:

Plane name Transect name Transect length Sample
P1 T1 1002 x
P1 T1 1002 y
P1 T2 1034 z
P1 T2 1034 a
P1 T2 1034 b
P2 T3 1222 c
P2 T3 1222 d
P2 T4 1324 e

I added an "index" column, thinking there must be some way I can add the transect lengths when the index for that row = 1.

data3 <- data2 %>%
  group_by(transect)%>%
  mutate(index = seq(n())) %>%
  ungroup() %>%

Table showing the added index column:

Plane name Transect name Transect length Sample Index
P1 T1 1002 x 1
P1 T1 1002 y 2
P1 T2 1034 z 1
P1 T2 1034 a 2
P1 T2 1034 b 3
P2 T3 1222 c 1
P2 T3 1222 d 2
P2 T4 1324 e 1

But now I am stuck on how to add only the transect lengths with index = 1 for a certain plane, and iterate over all the planes...

So for example, in the end, what I need it to look like is this:

Plane name Transect name Transect length Sample Index Plane length
P1 T1 1002 x 1 2036
P1 T1 1002 y 2 2036
P1 T2 1034 z 1 2036
P1 T2 1034 a 2 2036
P1 T2 1034 b 3 2036
P2 T3 1222 c 1 2546
P2 T3 1222 d 2 2546
P2 T4 1324 e 1 2546

Notice the plane length for plane P1 is the sum of only the T1 transect lengths that have index = 1. So: 1002 + 1034 = 2036, NOT 1002 + 1002 + 1034 + 1034 + 1034 = 5106.

I tried several variations of code using mutate, sum, summarize, etc. with no luck:

plane_lengths <- data3 %>% 
  mutate(plane_length = sum(transect_length[index ==1]))

^ this one seems to do nothing. :(

plane_lengths <- data3 %>% group_by(plane_name) %>% summarize(plane_length = sum(transect_length))

^ this one just adds every value of transect length (all index values, not just index = 1)

Is there some way to modify the code above to take only values for transect length where the index = 1?

I am still a bit new to R, and I really appreciate any help I can get.!! Thank you.

After creating the 'index', change the grouping to 'Planename' before we create the 'PlaneLength' column

library(dplyr)
data2 %>% 
   group_by(Transectname) %>% 
   mutate(index = row_number()) %>% 
   group_by(Planename) %>% 
   mutate(PlaneLength = sum(Transectlength[index == 1])) %>%
   ungroup

-output

# A tibble: 8 × 6
  Planename Transectname Transectlength Sample index PlaneLength
  <chr>     <chr>                 <int> <chr>  <int>       <int>
1 P1        T1                     1002 x          1        2036
2 P1        T1                     1002 y          2        2036
3 P1        T2                     1034 z          1        2036
4 P1        T2                     1034 a          2        2036
5 P1        T2                     1034 b          3        2036
6 P2        T3                     1222 c          1        2546
7 P2        T3                     1222 d          2        2546
8 P2        T4                     1324 e          1        2546

We may also use with_groups so that it won't create the group attrribute

data2 %>% 
  with_groups(Transectname, mutate, index = row_number()) %>%
  with_groups(Planename, mutate, 
    PlaneLength = sum(Transectlength[index == 1]))

-output

 Planename Transectname Transectlength Sample index PlaneLength
1        P1           T1           1002      x     1        2036
2        P1           T1           1002      y     2        2036
3        P1           T2           1034      z     1        2036
4        P1           T2           1034      a     2        2036
5        P1           T2           1034      b     3        2036
6        P2           T3           1222      c     1        2546
7        P2           T3           1222      d     2        2546
8        P2           T4           1324      e     1        2546

data

data2 <- structure(list(Planename = c("P1", "P1", "P1", "P1", "P1", "P2", 
"P2", "P2"), Transectname = c("T1", "T1", "T2", "T2", "T2", "T3", 
"T3", "T4"), Transectlength = c(1002L, 1002L, 1034L, 1034L, 1034L, 
1222L, 1222L, 1324L), Sample = c("x", "y", "z", "a", "b", "c", 
"d", "e")), class = "data.frame", row.names = c(NA, -8L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM