Analysing counting data on R

Question

This is a follow up to a previous question where I explained that I have a set of data of ~2000 people with repeated measurements over multiple years between 2000-2022 (some people have data for the full time period whereas others only for a subset of these years). Within a single year, each person can only fall into one of four groups: 0, 1, 2, or 3. After my previous question, I am now able to count the number of times that each person changes groupings within their sampling period using this code:

df %>%
  count(ID, wt = diff(CultGroup) != 0)

This is a subset of the data for the first 20 people sampled:

structure(list(ID = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 
2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 6, 6, 6, 
7, 7, 8, 9, 9, 9, 9, 9, 8, 8, 10, 10, 10, 10, 10, 10, 10, 10, 
10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 
11, 11, 11, 11, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 
14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 
16, 16, 16, 16, 16, 16, 16, 16, 17, 17, 17, 17, 17, 17, 18, 18, 
18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 20, 20, 
20), CultGroup = c(1, 1, 1, 1, 1, 1, 3, 3, 3, 1, 3, 3, 0, 1, 
3, 3, 3, 3, 3, 3, 3, 0, 0, 0, 0, 0, 2, 0, 0, 1, 1, 1, 1, 3, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 1, 1, 1, 3, 
1, 0, 2, 0, 0, 1, 2, 1, 0, 2, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 0, 0, 0, 
0, 0, 3, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 2, 2, 
3, 3, 3, 3, 3, 3, 1, 0, 0, 3, 0, 3, 3, 2, 2, 3, 2, 3, 3, 3, 0, 
0, 0, 0, 0, 0, 3, 3, 3, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 
3, 3, 0, 0, 0, 0, 0, 1, 1), Year = c(2010, 2011, 2012, 2013, 
2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2001, 2002, 2003, 
2004, 2001, 2002, 2003, 2004, 2005, 2007, 2009, 2010, 2011, 2012, 
2013, 2014, 2015, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 
2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 
2020, 2021, 2001, 2002, 2002, 2003, 2004, 2009, 2010, 2011, 2009, 
2010, 2011, 2012, 2013, 2020, 2021, 2013, 2014, 2015, 2016, 2017, 
2018, 2019, 2020, 2021, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 
2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 
2019, 2006, 2007, 2001, 2002, 2003, 2004, 2005, 2007, 2008, 2009, 
2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 
2022, 2009, 2011, 2012, 2013, 2014, 2015, 2017, 2018, 2019, 2020, 
2001, 2002, 2003, 2004, 2005, 2007, 2008, 2011, 2002, 2003, 2004, 
2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2016, 2017, 
2018, 2019, 2020, 2021, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 
2008, 2009, 2010, 2011, 2012, 2010, 2011, 2012, 2013, 2013, 2014, 
2015)), row.names = c(NA, -170L), class = c("tbl_df", "tbl", 
"data.frame"))

However, now I want to know more about the nature of these changes. I would like to know if the changes for each person are more often from one group to another eg 1 to 2 or if there is a lot more back and forth changes eg from group 1 to 2 and back to 1 again etc. Is there a best way to plot this or visualise the changes in groupings for each person? And are there any stats that would be advisable to quantify the nature of these changes?

Thanks!

Answer 1

simple visualisation option

library(tidyverse)
ggplot(data = mydata, aes( x = Year, y = CultGroup)) + 
  geom_col() +
  facet_wrap(~ID, ncol = 5)

Analysing counting data on R

Question

1 answers

solution1
2 2022-05-18 11:07:47

Analysing counting data on R

Question

1 answers

solution1 2 2022-05-18 11:07:47

solution1
2 2022-05-18 11:07:47