I've been using wide table format to create a migration variable (year, municipality -> year, municipality, move) and was wondering if I can flip it back into long table format. However, I now 2 groups per year instead of one. I looked through the existing posts on SO, but couldn't find anything similar.
Here's what I have done:
library(tidyverse)
library(rlang)
# sample data
mydata <- data.frame(id = sort(rep(1:10,3)),
year = rep(seq(2009,2011),10),
municip = sample(c(NA,1:3),30,replace=TRUE))
The data looks like this:
id | year | municip |
---|---|---|
1 | 2009 | 2 |
1 | 2010 | 1 |
1 | 2011 | 3 |
2 | 2009 | 1 |
2 | 2010 | 1 |
2 | 2011 | 3 |
3 | 2009 | NA |
3 | 2010 | NA |
3 | 2011 | NA |
# turn sideways
mydata.wide <- mydata %>%
pivot_wider(names_from = year,
names_prefix = "municip.",
values_from = municip)
Now it looks like this:
id | municip.2009 | municip.2010 | municip.2011 |
---|---|---|---|
1 | 2 | 1 | 3 |
2 | 1 | 1 | 3 |
3 | NA | NA | NA |
4 | 1 | NA | 3 |
5 | 1 | NA | 2 |
6 | 3 | 2 | 2 |
7 | 2 | NA | 3 |
8 | 3 | NA | 3 |
9 | NA | 1 | NA |
10 | 1 | NA | 2 |
Then I'm adding a migration variable (in reality this is done for 12 years):
# create migration variable
for (i in 2009:2010){
text.string <- paste0("mydata.wide <- mydata.wide %>%
mutate(move.",i+1," = case_when(
is.na(municip.",i,") & is.na(municip.",i+1,") ~ \"NA\",
is.na(municip.",i,") & !is.na(municip.",i+1,") ~ \"1\",
!is.na(municip.",i,") & !is.na(municip.",i+1,")
& municip.",i," != municip.",i+1," ~ \"3\",
!is.na(municip.",i,") & is.na(municip.",i+1,") ~ \"4\",
TRUE ~ \"2\"
))")
eval(parse_expr(text.string))
}
# NA: missing in both cases
# 1: move into region
# 2: stayed in region
# 3: moved within region
# 4: moved out of region
Now the table looks like this:
id | municip.2009 | municip.2010 | municip.2011 | move.2010 | move.2011 |
---|---|---|---|---|---|
1 | 2 | 1 | 3 | 3 | 3 |
2 | 1 | 1 | 3 | 2 | 3 |
3 | NA | NA | NA | NA | NA |
4 | 1 | NA | 3 | 4 | 1 |
5 | 1 | NA | 2 | 4 | 1 |
6 | 3 | 2 | 2 | 3 | 2 |
7 | 2 | NA | 3 | 4 | 1 |
8 | 3 | NA | 3 | 4 | 1 |
9 | NA | 1 | NA | 1 | 4 |
10 | 1 | NA | 2 | 4 | 1 |
What I want to do is to flip it back to create something like this:
id | year | municip | move |
---|---|---|---|
1 | 2009 | 2 | NA |
1 | 2010 | 1 | 3 |
1 | 2011 | 3 | 3 |
2 | 2009 | 1 | NA |
2 | 2010 | 1 | 2 |
2 | 2011 | 3 | 3 |
3 | 2009 | NA | NA |
3 | 2010 | NA | NA |
3 | 2011 | NA | NA |
I'm not sure if this can be done with just pivot_longer
on it's own. I tried a couple of variations. Any ideas?
You can try this:
df <- tribble(~id, ~municip.2009, ~municip.2010, ~municip.2011, ~move.2010, ~move.2011,
1, 2, 1, 3, 3, 3,
2, 1, 1, 3, 2, 3,
3, NA, NA, NA, NA, NA,
4, 1, NA, 3, 4, 1,
5, 1, NA, 2, 4, 1,
6, 3, 2, 2, 3, 2,
7, 2, NA, 3, 4, 1,
8, 3, NA, 3, 4, 1,
9, NA, 1, NA, 1, 4,
10, 1, NA, 2, 4, 1
)
df %>%
pivot_longer(cols = -1, names_to = "temp1", values_to = "count") %>%
separate(col = temp1, c("temp2", "year")) %>%
pivot_wider(names_from = temp2, values_from = count)
pivot_longer
collects municip
and move
in the same column; with separate
split municip
and move
by the years
; finally with pivot_wider
you get the final result.
Don't think sideways, think longways!
Now, I cannot answer your question completly, because I don't really understand what you are calculating. Is it some sort of factor (1-4)? But I believe you can finish this yourself. Consider the following:
> mydata %>% group_by(id) %>%
arrange(year) %>%
mutate(last_year = lag(municip)) %>%
ungroup %>%
arrange(id) %>% as.data.frame # ignore this line, it is simply for the pleasure of seeing the data.frame
id year municip last_year
1 1 2009 3 NA
2 1 2010 2 3
3 1 2011 NA 2
4 2 2009 NA NA
5 2 2010 NA NA
6 2 2011 1 NA
7 3 2009 3 NA
8 3 2010 2 3
9 3 2011 2 2
10 4 2009 2 NA
11 4 2010 NA 2
12 4 2011 1 NA
13 5 2009 3 NA
14 5 2010 NA 3
15 5 2011 2 NA
16 6 2009 1 NA
17 6 2010 3 1
18 6 2011 2 3
19 7 2009 3 NA
20 7 2010 2 3
21 7 2011 2 2
22 8 2009 NA NA
23 8 2010 NA NA
24 8 2011 3 NA
25 9 2009 1 NA
26 9 2010 NA 1
27 9 2011 1 NA
28 10 2009 3 NA
29 10 2010 NA 3
30 10 2011 NA NA
You see? In long-form, you now can simply continue with
%>% mutate(move = case_when(
is.na(.$municip) & is.na(.$last_year) ~ \"NA\",
# etc.
))
Did you want the comparision from year i to the following year? Use the function lead
instead of lag
.
Lastly, your text-code might not work; when using case_when
you have to refer to variables in the piped result with .$
.
Something like this?
mydata.wide %>%
pivot_longer(
cols = -id,
names_pattern = "([a-z]+?)\\.(\\d+)",
names_to = c("name", "year"),
values_to = "val",
values_transform = list(val = as.character)
) %>%
pivot_wider(
names_from = name,
values_from = val
) %>%
print(n=30)
A tibble: 30 × 4
id year municip move
<int> <chr> <chr> <chr>
1 1 2009 2 NA
2 1 2010 3 3
3 1 2011 NA 4
4 2 2009 2 NA
5 2 2010 NA 4
6 2 2011 2 1
7 3 2009 1 NA
8 3 2010 2 3
9 3 2011 1 3
10 4 2009 NA NA
11 4 2010 NA NA
12 4 2011 1 1
13 5 2009 NA NA
14 5 2010 2 1
15 5 2011 3 3
16 6 2009 3 NA
17 6 2010 3 2
18 6 2011 3 2
19 7 2009 NA NA
20 7 2010 NA NA
21 7 2011 NA NA
22 8 2009 NA NA
23 8 2010 2 1
24 8 2011 NA 4
25 9 2009 3 NA
26 9 2010 2 3
27 9 2011 NA 4
28 10 2009 2 NA
29 10 2010 3 3
30 10 2011 1 3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.