I'm currently using the Ethnic Power Relations 2014 data set . Here's a small snippet of the data that I'm trying to manipulate:
statename from to gwgroupid size
[,1] United States 1966 2008 201000 0.691
[,2] United States 1966 2008 201000 0.125
[,3] United States 1966 2008 203000 0.124
where from and to are the first and last year of the observation, and gwgroupid is a marker for a particular ethnic group in a particular country.
I'd like to expand the data set so that it records an observation for every year in the range delineated by from and to , and then deletes from and to . The first three rows of the expanded data set would look like:
statename year gwgroupid size
[,1] United States 1966 201000 0.691
[,2] United States 1967 201000 0.691
[,3] United States 1968 201000 0.691
How can I do this given that each country has a different range of years?
You can use the unnest
function from the tidyr
package:
library(tidyr)
df$year <- mapply(seq,df$from,df$to,SIMPLIFY=FALSE)
df %>%
unnest(year) %>%
select(-from,-to)
# statename gwgroupid size year
#1 UnitedStates 201000 0.691 1966
#2 UnitedStates 201000 0.691 1967
#3 UnitedStates 201000 0.691 1968
[Update] Alternatively, you can use the data.table
package:
library(data.table)
as.data.table(df)[,.(year=seq(from,to)),by=.(statename,gwgroupid,size)]
This does it... there may be a cleaner, quicker way-
your data:
df<-
read.table(text="
statename from to gwgroupid size
UnitedStates 1966 2008 201000 0.691
UnitedStates 1966 2008 202000 0.125
UnitedStates 1966 2008 203000 0.124", header=T)
library(dplyr)
df$freq <- df$to - df$from
df.expanded <- df[rep(row.names(df), df$freq), 1:5]
df.expanded %>%
group_by(statename) %>%
mutate(year = from + row_number(from)) %>%
select(statename, year, gwgroupid, size)
to get:
statename year gwgroupid size
1 UnitedStates 1967 201000 0.691
2 UnitedStates 1968 201000 0.691
3 UnitedStates 1969 201000 0.691
4 UnitedStates 1970 201000 0.691
5 UnitedStates 1971 201000 0.691
6 UnitedStates 1972 201000 0.691
7 UnitedStates 1973 201000 0.691
8 UnitedStates 1974 201000 0.691
9 UnitedStates 1975 201000 0.691
10 UnitedStates 1976 201000 0.691
.. ... ... ... ...
edit: just noticed that your results require 'gwgroupid' to increase on rows 1-3 but the size stays the same.... is your desired result correct?
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.