I want to create a new variable using dplyr in R Here is the data -
year TICKER auditor_fkey pauditor_fkey AUDTURNOVER
1 2001 AIR 4 NA NA
2 2002 AIR 4 4 0
3 2003 AIR 4 4 0
4 2004 AIR 4 4 0
5 2005 AIR 4 4 0
6 2006 AIR 4 4 0
7 2007 AIR 4 4 0
8 2008 AIR 4 4 0
9 2009 AIR 4 4 0
10 2010 AIR 4 4 0
11 2011 AIR 4 4 0
12 2012 AIR 4 4 0
13 2013 AIR 4 4 0
14 2014 AIR 4 4 0
15 2015 AIR 4 4 0
16 2016 AIR 4 4 0
17 2017 AIR 4 4 0
18 2000 ABT 5 NA NA
19 2001 ABT 5 5 0
20 2002 ABT 3 5 1
21 2003 ABT 3 3 0
22 2004 ABT 3 3 0
23 2005 ABT 3 3 0
24 2006 ABT 3 3 0
25 2007 ABT 3 3 0
26 2008 ABT 3 3 0
27 2009 ABT 3 3 0
28 2010 ABT 3 3 0
29 2011 ABT 3 3 0
30 2012 ABT 3 3 0
31 2013 ABT 3 3 0
32 2014 ABT 2 3 1
33 2015 ABT 2 2 0
34 2016 ABT 2 2 0
35 2017 ABT 2 2 0
I created the "pauditor_fkey" variable, using the following code
my_data <- my_data%>%
group_by(TICKER) %>%
mutate(pauditor_fkey = lag (auditor_fkey))
Here, year = a year; TICKER = identifier of a company; auditor_fkey = identified of the auditor who audited the company in a year (eg, auditor "4" audited company "AIR" in "2001); pauditor_fkey = the auditor in the previous year; AUDTURNOVER = if in a particular year, a auditor changes I code them 1, otherwise 0. Now, I want to create a new variable called AUDITOR_TENURE , using the above variables After running the code, the data should look like this -
year TICKER auditor_fkey pauditor_fkey AUDTURNOVER AUDITOR_TENURE
1 2001 AIR 4 NA NA 1
2 2002 AIR 4 4 0 2
3 2003 AIR 4 4 0 3
4 2004 AIR 4 4 0 4
5 2005 AIR 4 4 0 5
6 2006 AIR 4 4 0 6
7 2007 AIR 4 4 0 7
8 2008 AIR 4 4 0 8
9 2009 AIR 4 4 0 9
10 2010 AIR 4 4 0 10
11 2011 AIR 4 4 0 11
12 2012 AIR 4 4 0 12
13 2013 AIR 4 4 0 13
14 2014 AIR 4 4 0 14
15 2015 AIR 4 4 0 15
16 2016 AIR 4 4 0 16
17 2017 AIR 4 4 0 17
18 2000 ABT 5 NA NA 1
19 2001 ABT 5 5 0 2
20 2002 ABT 3 5 1 1
21 2003 ABT 3 3 0 2
22 2004 ABT 3 3 0 3
23 2005 ABT 3 3 0 4
24 2006 ABT 3 3 0 5
25 2007 ABT 3 3 0 6
26 2008 ABT 3 3 0 7
27 2009 ABT 3 3 0 8
28 2010 ABT 3 3 0 9
29 2011 ABT 3 3 0 10
30 2012 ABT 3 3 0 11
31 2013 ABT 3 3 0 12
32 2014 ABT 2 3 1 1
33 2015 ABT 2 2 0 2
34 2016 ABT 2 2 0 3
35 2017 ABT 2 2 0 4
if the AUDITOR_TENURE variable is inspected, it is seen that for TICKER == AIR, there was no change in the auditor; so, the AUDITOR_TENURE increases. For TICKER == ABT, it is seen that for years 2000 and 2001, there was no change of the auditor; so the AUDITOR_TENURE is 1 and 2 respectively for those years for that auditor. However, in 2002, there was a change in auditor for ABT and that auditor continues to work until 2013 and therefore for them new number is generated for AUDITOR_TENURE variable. Then in 2015, they change auditor again and the auditor continues work until 2017 and their tenure is calculated accordingly.
I used the following code -
my_data <- my_data %>%
group_by(TICKER, group = rleid(auditor_fkey)) %>%
mutate(AUDITOR_TENURE = row_number()) %>%
ungroup()
Alternatively, this one -
my_data <- my_data %>%
group_by(TICKER, group = cumsum(auditor_fkey !=
lag(auditor_fkey, default = first(auditor_fkey)))) %>%
mutate(AUDITOR_TENURE = row_number()) %>%
ungroup()
For both cases, I got these results - which are not correct.
year TICKER auditor_fkey AUDTURNOVER group AUDITOR_TENURE
1 2001 AIR 4 NA 3 1
2 2002 AIR 4 0 6 1
3 2003 AIR 4 0 9 1
4 2004 AIR 4 0 12 1
5 2005 AIR 4 0 15 1
6 2006 AIR 4 0 18 1
7 2007 AIR 4 0 21 1
8 2008 AIR 4 0 24 1
9 2009 AIR 4 0 28 1
10 2010 AIR 4 0 32 1
11 2011 AIR 4 0 35 1
12 2012 AIR 4 0 38 1
13 2013 AIR 4 0 41 1
14 2014 AIR 4 0 44 1
15 2015 AIR 4 0 47 1
16 2016 AIR 4 0 50 1
17 2017 AIR 4 0 53 1
18 2000 ABT 5 NA 1 1
19 2001 ABT 5 0 4 1
20 2002 ABT 3 1 7 1
21 2003 ABT 3 0 10 1
22 2004 ABT 3 0 13 1
23 2005 ABT 3 0 16 1
24 2006 ABT 3 0 19 1
25 2007 ABT 3 0 22 1
26 2008 ABT 3 0 25 1
27 2009 ABT 3 0 29 1
28 2010 ABT 3 0 33 1
29 2011 ABT 3 0 36 1
30 2012 ABT 3 0 39 1
31 2013 ABT 3 0 42 1
32 2014 ABT 2 1 45 1
33 2015 ABT 2 0 48 1
34 2016 ABT 2 0 51 1
35 2017 ABT 2 0 54 1
使用data.table
,它将只是DT[, AUDITOR_TENURE := rowid(TICKER, auditor_fkey)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.