简体   繁体   中英

Using mutate to create a new column with the first value of each group in R

I'm currently working on a Sabermetric research project and I've been stuck all day trying to create a new column in a data frame that displays the starting pitcher for a given game. Essentially, if I use the sample below, I have data for 'a' and 'b', but I can't figure out how to create 'c' to be the first value of 'b' for each unique value of 'a'. This should be easy, but I just started learning R.

    a   b   c
1   1   1   1
2   1   2   1
3   1   3   1
4   1   4   1
5   1   5   1
6   1   6   1
7   2   7   7
8   2   8   7
9   2   1   7
10  2   2   7
11  2   3   7
12  2   4   7
13  3   5   5
14  3   6   5
15  3   7   5

So far I've used mutate and group_by to come up with sample <- sample %>% group_by(a) %>% mutate(c = first(b)) But this just makes every value of 'c' the first value of the first 'b'. So in the sample above, my current code makes every value of 'c' equal to 1. I'm missing something, any suggestions?

Not so elegant but it works, I hope it works for you too:

df1 %>% group_by(a) %>% mutate(c = rep(first(b), length(a)))
Source: local data frame [15 x 3]
Groups: a [3]

       a     b     c
   (int) (int) (int)
1      1     1     1
2      1     2     1
3      1     3     1
4      1     4     1
5      1     5     1
6      1     6     1
7      2     7     7
8      2     8     7
9      2     1     7
10     2     2     7
11     2     3     7
12     2     4     7
13     3     5     5
14     3     6     5
15     3     7     5

Using library dplyr , you can do something like this:

library(dplyr)
df %>% group_by(a) %>% mutate(c = b[1])

Output is as follows:

Source: local data frame [15 x 3]
Groups: a [3]

       a     b     c
   (int) (int) (int)
1      1     1     1
2      1     2     1
3      1     3     1
4      1     4     1
5      1     5     1
6      1     6     1
7      2     7     7
8      2     8     7
9      2     1     7
10     2     2     7
11     2     3     7
12     2     4     7
13     3     5     5
14     3     6     5
15     3     7     5

Changing columns to the types mentioned below in comments and running code produces desired output:

df$b <- as.factor(df$b)
df$a <- as.character(df$a)
str(df)
'data.frame':   15 obs. of  3 variables:
 $ a: chr  "1" "1" "1" "1" ...
 $ b: Factor w/ 8 levels "1","2","3","4",..: 1 2 3 4 5 6 7 8 1 2 ...
 $ c: int  1 1 1 1 1 1 7 7 7 7 ...

df %>% group_by(a) %>% mutate(c = b[1])
Source: local data frame [15 x 3]
Groups: a [3]

       a      b      c
   (chr) (fctr) (fctr)
1      1      1      1
2      1      2      1
3      1      3      1
4      1      4      1
5      1      5      1
6      1      6      1
7      2      7      7
8      2      8      7
9      2      1      7
10     2      2      7
11     2      3      7
12     2      4      7
13     3      5      5
14     3      6      5
15     3      7      5

We can use base R

 df1$c <- with(df1, ave(b, a, FUN= function(x) head(x,1)))

Or with data.table

library(data.table)
setDT(df1)[, c:= head(b, 1), by = a]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM