简体   繁体   中英

Analysing counting data on R

This is a follow up to a previous question where I explained that I have a set of data of ~2000 people with repeated measurements over multiple years between 2000-2022 (some people have data for the full time period whereas others only for a subset of these years). Within a single year, each person can only fall into one of four groups: 0, 1, 2, or 3. After my previous question, I am now able to count the number of times that each person changes groupings within their sampling period using this code:

df %>%
  count(ID, wt = diff(CultGroup) != 0)

This is a subset of the data for the first 20 people sampled:

structure(list(ID = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 
2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 
4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 6, 6, 6, 
7, 7, 8, 9, 9, 9, 9, 9, 8, 8, 10, 10, 10, 10, 10, 10, 10, 10, 
10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 
11, 11, 11, 11, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 
13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 14, 14, 14, 14, 14, 14, 
14, 14, 14, 14, 15, 15, 15, 15, 15, 15, 15, 15, 16, 16, 16, 16, 
16, 16, 16, 16, 16, 16, 16, 16, 17, 17, 17, 17, 17, 17, 18, 18, 
18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 20, 20, 
20), CultGroup = c(1, 1, 1, 1, 1, 1, 3, 3, 3, 1, 3, 3, 0, 1, 
3, 3, 3, 3, 3, 3, 3, 0, 0, 0, 0, 0, 2, 0, 0, 1, 1, 1, 1, 3, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 1, 1, 1, 3, 
1, 0, 2, 0, 0, 1, 2, 1, 0, 2, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 3, 3, 0, 0, 0, 
0, 0, 3, 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 2, 2, 
3, 3, 3, 3, 3, 3, 1, 0, 0, 3, 0, 3, 3, 2, 2, 3, 2, 3, 3, 3, 0, 
0, 0, 0, 0, 0, 3, 3, 3, 1, 1, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 
3, 3, 0, 0, 0, 0, 0, 1, 1), Year = c(2010, 2011, 2012, 2013, 
2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2001, 2002, 2003, 
2004, 2001, 2002, 2003, 2004, 2005, 2007, 2009, 2010, 2011, 2012, 
2013, 2014, 2015, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 
2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 
2020, 2021, 2001, 2002, 2002, 2003, 2004, 2009, 2010, 2011, 2009, 
2010, 2011, 2012, 2013, 2020, 2021, 2013, 2014, 2015, 2016, 2017, 
2018, 2019, 2020, 2021, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 
2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 
2019, 2006, 2007, 2001, 2002, 2003, 2004, 2005, 2007, 2008, 2009, 
2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 
2022, 2009, 2011, 2012, 2013, 2014, 2015, 2017, 2018, 2019, 2020, 
2001, 2002, 2003, 2004, 2005, 2007, 2008, 2011, 2002, 2003, 2004, 
2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2016, 2017, 
2018, 2019, 2020, 2021, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 
2008, 2009, 2010, 2011, 2012, 2010, 2011, 2012, 2013, 2013, 2014, 
2015)), row.names = c(NA, -170L), class = c("tbl_df", "tbl", 
"data.frame"))

However, now I want to know more about the nature of these changes. I would like to know if the changes for each person are more often from one group to another eg 1 to 2 or if there is a lot more back and forth changes eg from group 1 to 2 and back to 1 again etc. Is there a best way to plot this or visualise the changes in groupings for each person? And are there any stats that would be advisable to quantify the nature of these changes?

Thanks!

simple visualisation option

library(tidyverse)
ggplot(data = mydata, aes( x = Year, y = CultGroup)) + 
  geom_col() +
  facet_wrap(~ID, ncol = 5)

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM