简体   繁体   中英

Creating a variable that has a value based on another variable in R (panel data)

I have a dataframe containing panel data with patent and economic information in the 2012-2020 time period. I have a time invariant variable, investment_year , which is the year in which a certain company has received an initial investment. patent_applications is the annual number of patents filed by a certain company. Company A, for example, filed 5 patents in 2018, 2 in 2019, etc.

company_name    investment_year        year       patent_applications
A                    2018               2020             7
A                    2018               2019             2
A                    2018               2018             5
.                     .                   .              .
.                     .                   .              . 
.                     .                   .              .
A                    2018               2012             4 
B                    2015               2020             10
B                    2015               2019             3
B                    2015               2018             7
.                      .                  .              .
.                      .                  .              .
.                      .                  .              .

I would like to create a variable which contains the number of applications at t+2, where t is the investment year. So, for example, for Company A the number of applications at t+2 ( patent_applications_t2 ) would be 7, as its investment year (2018) + 2 equals 2020.

I tried the line of code below, but it does not produce the correct result.

df$patent_applications_t2 <- df$patent_applications[df$Year == df$Investment_Year + 2]

There must be a better way to accomplish what you are looking for. I got the following.

library(tidyverse)

tbl <- tribble(~company_name,    ~investment_year,        ~year,       ~patent_applications,
                "A",                    2018,             2020,             7,
                "A",                    2018,             2019,             2,
                "A",                    2018,             2018,             5,
               "A",                    2018,               2012,             4, 
               "B",                    2015,               2020,             10,
               "B",                    2015,               2019,             3,
               "B",                    2015,               2018,             7
)

tbl %>% group_by(company_name) %>%
  arrange(investment_year,year) %>%
  mutate(t2 = ifelse(year - investment_year <= 1 & year - investment_year >=0, 1, 0)) %>%
  mutate(cumulative_application = t2*cumsum(patent_applications*t2)) %>%
  ungroup() %>%
  arrange(company_name) %>%
  select(company_name,investment_year,year,patent_applications,cumulative_application)

you get this result:

# A tibble: 7 x 5
  company_name investment_year  year patent_applications cumulative_application
  <chr>                  <dbl> <dbl>               <dbl>                  <dbl>
1 A                       2018  2012                   4                      0
2 A                       2018  2018                   5                      5
3 A                       2018  2019                   2                      7
4 A                       2018  2020                   7                      0
5 B                       2015  2018                   7                      0
6 B                       2015  2019                   3                      0
7 B                       2015  2020                  10                      0

I chose to show the cumulative application but you can easily only show the second entry only.

Another solution (probably better) would be to create a function using within() . Hope this helps you a bit.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM