简体   繁体   中英

Expand year range in R

I'm currently using the Ethnic Power Relations 2014 data set . Here's a small snippet of the data that I'm trying to manipulate:

     statename       from   to    gwgroupid size
[,1] United States   1966   2008  201000    0.691
[,2] United States   1966   2008  201000    0.125
[,3] United States   1966   2008  203000    0.124

where from and to are the first and last year of the observation, and gwgroupid is a marker for a particular ethnic group in a particular country.

I'd like to expand the data set so that it records an observation for every year in the range delineated by from and to , and then deletes from and to . The first three rows of the expanded data set would look like:

     statename       year    gwgroupid size
[,1] United States   1966    201000    0.691
[,2] United States   1967    201000    0.691
[,3] United States   1968    201000    0.691

How can I do this given that each country has a different range of years?

You can use the unnest function from the tidyr package:

library(tidyr)

df$year <- mapply(seq,df$from,df$to,SIMPLIFY=FALSE)

df %>% 
  unnest(year) %>% 
  select(-from,-to)

#       statename gwgroupid  size year
#1   UnitedStates    201000 0.691 1966
#2   UnitedStates    201000 0.691 1967
#3   UnitedStates    201000 0.691 1968

[Update] Alternatively, you can use the data.table package:

library(data.table)
as.data.table(df)[,.(year=seq(from,to)),by=.(statename,gwgroupid,size)]

This does it... there may be a cleaner, quicker way-

your data:

df<-
read.table(text="
statename       from   to    gwgroupid size
UnitedStates   1966   2008  201000    0.691
UnitedStates   1966   2008  202000    0.125
UnitedStates   1966   2008  203000    0.124", header=T)


library(dplyr)
df$freq <- df$to - df$from
    df.expanded <- df[rep(row.names(df), df$freq), 1:5]


    df.expanded %>% 
      group_by(statename) %>%
      mutate(year = from + row_number(from)) %>% 
     select(statename, year, gwgroupid, size) 

to get:

      statename year gwgroupid  size
1  UnitedStates 1967    201000 0.691
2  UnitedStates 1968    201000 0.691
3  UnitedStates 1969    201000 0.691
4  UnitedStates 1970    201000 0.691
5  UnitedStates 1971    201000 0.691
6  UnitedStates 1972    201000 0.691
7  UnitedStates 1973    201000 0.691
8  UnitedStates 1974    201000 0.691
9  UnitedStates 1975    201000 0.691
10 UnitedStates 1976    201000 0.691
..          ...  ...       ...   ...

edit: just noticed that your results require 'gwgroupid' to increase on rows 1-3 but the size stays the same.... is your desired result correct?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM