简体   繁体   中英

Creating a new variable using group_by and rleid() function

I want to create a new variable using dplyr in R Here is the data -

year  TICKER   auditor_fkey  pauditor_fkey  AUDTURNOVER
1  2001    AIR            4            NA          NA
2  2002    AIR            4             4           0
3  2003    AIR            4             4           0
4  2004    AIR            4             4           0
5  2005    AIR            4             4           0
6  2006    AIR            4             4           0
7  2007    AIR            4             4           0
8  2008    AIR            4             4           0
9  2009    AIR            4             4           0
10 2010    AIR            4             4           0
11 2011    AIR            4             4           0
12 2012    AIR            4             4           0
13 2013    AIR            4             4           0
14 2014    AIR            4             4           0
15 2015    AIR            4             4           0
16 2016    AIR            4             4           0
17 2017    AIR            4             4           0
18 2000    ABT            5            NA          NA
19 2001    ABT            5             5           0
20 2002    ABT            3             5           1
21 2003    ABT            3             3           0
22 2004    ABT            3             3           0
23 2005    ABT            3             3           0
24 2006    ABT            3             3           0
25 2007    ABT            3             3           0
26 2008    ABT            3             3           0
27 2009    ABT            3             3           0
28 2010    ABT            3             3           0
29 2011    ABT            3             3           0
30 2012    ABT            3             3           0
31 2013    ABT            3             3           0
32 2014    ABT            2             3           1
33 2015    ABT            2             2           0
34 2016    ABT            2             2           0
35 2017    ABT            2             2           0

I created the "pauditor_fkey" variable, using the following code

my_data <- my_data%>%
  group_by(TICKER) %>% 
  mutate(pauditor_fkey = lag (auditor_fkey))

Here, year = a year; TICKER = identifier of a company; auditor_fkey = identified of the auditor who audited the company in a year (eg, auditor "4" audited company "AIR" in "2001); pauditor_fkey = the auditor in the previous year; AUDTURNOVER = if in a particular year, a auditor changes I code them 1, otherwise 0. Now, I want to create a new variable called AUDITOR_TENURE , using the above variables After running the code, the data should look like this -

year  TICKER   auditor_fkey   pauditor_fkey  AUDTURNOVER  AUDITOR_TENURE
1  2001    AIR            4            NA          NA              1
2  2002    AIR            4             4           0              2
3  2003    AIR            4             4           0              3
4  2004    AIR            4             4           0              4
5  2005    AIR            4             4           0              5
6  2006    AIR            4             4           0              6
7  2007    AIR            4             4           0              7
8  2008    AIR            4             4           0              8
9  2009    AIR            4             4           0              9
10 2010    AIR            4             4           0             10
11 2011    AIR            4             4           0             11
12 2012    AIR            4             4           0             12
13 2013    AIR            4             4           0             13
14 2014    AIR            4             4           0             14
15 2015    AIR            4             4           0             15
16 2016    AIR            4             4           0             16
17 2017    AIR            4             4           0             17
18 2000    ABT            5            NA          NA              1
19 2001    ABT            5             5           0              2
20 2002    ABT            3             5           1              1
21 2003    ABT            3             3           0              2
22 2004    ABT            3             3           0              3
23 2005    ABT            3             3           0              4
24 2006    ABT            3             3           0              5
25 2007    ABT            3             3           0              6
26 2008    ABT            3             3           0              7
27 2009    ABT            3             3           0              8
28 2010    ABT            3             3           0              9
29 2011    ABT            3             3           0             10
30 2012    ABT            3             3           0             11
31 2013    ABT            3             3           0             12
32 2014    ABT            2             3           1              1
33 2015    ABT            2             2           0              2
34 2016    ABT            2             2           0              3
35 2017    ABT            2             2           0              4

if the AUDITOR_TENURE variable is inspected, it is seen that for TICKER == AIR, there was no change in the auditor; so, the AUDITOR_TENURE increases. For TICKER == ABT, it is seen that for years 2000 and 2001, there was no change of the auditor; so the AUDITOR_TENURE is 1 and 2 respectively for those years for that auditor. However, in 2002, there was a change in auditor for ABT and that auditor continues to work until 2013 and therefore for them new number is generated for AUDITOR_TENURE variable. Then in 2015, they change auditor again and the auditor continues work until 2017 and their tenure is calculated accordingly.

I used the following code -

my_data <- my_data %>%  
  group_by(TICKER, group = rleid(auditor_fkey)) %>%
  mutate(AUDITOR_TENURE = row_number()) %>%
  ungroup() 

Alternatively, this one -

my_data <- my_data %>%  
  group_by(TICKER, group = cumsum(auditor_fkey != 
                                    lag(auditor_fkey, default = first(auditor_fkey)))) %>%
  mutate(AUDITOR_TENURE = row_number()) %>%
  ungroup() 

For both cases, I got these results - which are not correct.

    year TICKER auditor_fkey AUDTURNOVER group AUDITOR_TENURE
1  2001    AIR            4          NA     3              1
2  2002    AIR            4           0     6              1
3  2003    AIR            4           0     9              1
4  2004    AIR            4           0    12              1
5  2005    AIR            4           0    15              1
6  2006    AIR            4           0    18              1
7  2007    AIR            4           0    21              1
8  2008    AIR            4           0    24              1
9  2009    AIR            4           0    28              1
10 2010    AIR            4           0    32              1
11 2011    AIR            4           0    35              1
12 2012    AIR            4           0    38              1
13 2013    AIR            4           0    41              1
14 2014    AIR            4           0    44              1
15 2015    AIR            4           0    47              1
16 2016    AIR            4           0    50              1
17 2017    AIR            4           0    53              1
18 2000    ABT            5          NA     1              1
19 2001    ABT            5           0     4              1
20 2002    ABT            3           1     7              1
21 2003    ABT            3           0    10              1
22 2004    ABT            3           0    13              1
23 2005    ABT            3           0    16              1
24 2006    ABT            3           0    19              1
25 2007    ABT            3           0    22              1
26 2008    ABT            3           0    25              1
27 2009    ABT            3           0    29              1
28 2010    ABT            3           0    33              1
29 2011    ABT            3           0    36              1
30 2012    ABT            3           0    39              1
31 2013    ABT            3           0    42              1
32 2014    ABT            2           1    45              1
33 2015    ABT            2           0    48              1
34 2016    ABT            2           0    51              1
35 2017    ABT            2           0    54              1

使用data.table ,它将只是DT[, AUDITOR_TENURE := rowid(TICKER, auditor_fkey)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM