简体   繁体   中英

How to calculate interpregnancy interval by mother ID?

I am trying to calculate pregnancy interval between births for each mother in my data set, using mother ID, in R.

This would be taking into account that a mother could have multiple births (ie 2 births or 10 births), however, some of the pregnancies could have ended in miscarriage or stillbirth. In addition, some of the pregnancies could be multiple (ie twins, triplets), as is the case of mother with ID 3.

Below is an example of the dataset.

Mother_ID Last mentrual period Birth_date Nr_fetuses Preg_outcome Gestational_age Child_ID
1 1996-04-15 1996-12-08 1 Livebirth 237 C1
2 2018-06-01 2019-02-18 1 Livebirth 262 C2
3 2002-08-23 2003-05-07 1 Livebirth 257 C3
3 1998-04-22 1999-01-15 2 LiveBirth 268 C4
3 1998-04-22 1999-01-15 2 Livebirth 268 C5
3 1992-02-21 1992-11-22 1 Livebirth 275 C6
4 2006-02-28 2006-11-18 1 Livebirth 263 C7
4 2003-01-31 2003-11-12 1 Livebirth 285 C8
4 2005-01-04 2005-03-18 1 Miscarriage 73
5 2009-04-08 2009-06-06 1 Miscarriage 59
5 2009-08-01 2010-05-02 1 Stillbirth 274 C9
6 1992-02-02 1992-09-05 1 Stillbirth 216
6 1995-02-21 1995-11-13 1 Livebirth 265 C10
6 1990-02-08 1990-11-07 1 Livebirth 272 C11

The outpout would show the pregnancy intervals, in days, ordered by date of birth for each mother. Please could you suggest ways of achieving this, ideally with base R.

The output would be something like this:

Line Mother_ID Last mentrual period Birth_date Nr_fetuses Preg_outcome Gestational_age Child_ID Inter_preg_inter (days)
1 1 1996-04-15 1996-12-08 1 Livebirth 237 C1 0
2 2 2018-06-01 2019-02-18 1 Livebirth 262 C2 0
3 3 1992-02-21 1992-11-22 1 Livebirth 275 C6 0
4 3 1998-04-22 1999-01-15 2 LiveBirth 268 C4 1977
5 3 1998-04-22 1999-01-15 2 Livebirth 268 C5 1977
6 3 2002-08-23 2003-05-07 1 Livebirth 257 C3 1316
7 4 2003-01-31 2003-11-12 1 Livebirth 285 C8 0
8 4 2005-01-04 2005-03-18 1 Miscarriage 73 419
9 4 2006-02-28 2006-11-18 1 Livebirth 263 C7 347
10 5 2009-04-08 2009-06-06 1 Miscarriage 59 0
11 5 2009-08-01 2010-05-02 1 Stillbirth 274 C9 56
12 6 1990-02-08 1990-11-07 1 Livebirth 272 C11 0
13 6 1992-02-02 1992-09-05 1 Stillbirth 216 452
14 6 1995-02-21 1995-11-13 1 Livebirth 265 C10 899

So: Pregnancy interval =Date of the last birth(line4) - Date of most recent birth (line3) - gestational age (ie Mother_ID 3--->1999-01-15 - 1992-11-22 - 268 =1977 days)

OR

Pregnancy interval =Date of the last menstrual period(line4) - Date of most recent birth (line3) (ie Mother_ID 3 ---> 1999-01-15 - 1992-11-22 =1977 days).

Sorted by mother ID and order of birth.

This might help you moving forward - though there may be certain circumstances you'd want to modify further (eg, twins where only old child survives, or multiple births spread out on consecutive dates past midnight, etc.).

First, you can sort your data by Birth_date and group by Mother_ID . You can create a pregnancy number to count pregnancies and allow for grouping, where the same "pregnancy" would be when the Birth_date is not more than 1 day apart.

Then, grouping by both Mother_ID and this new pregnancy number Preg_num , keep only one row of data (ignore that one or more children would be omitted, for now). After that, group again by Mother_ID and calculate intervals between pregnancies.

Finally, you can right_join back to the original data.

library(tidyverse)

df$Birth_date <- as.Date(df$Birth_date)

df %>%
  arrange(Mother_ID, Birth_date) %>%
  group_by(Mother_ID) %>%
  mutate(Preg_num = cumsum(Birth_date - lag(Birth_date, default = first(Birth_date)) > 1) + 1) %>%
  group_by(Mother_ID, Preg_num) %>%
  slice(1) %>%
  group_by(Mother_ID) %>%
  mutate(Inter_preg_inter = ifelse(
    Preg_num == 1,
    0,
    Birth_date - lag(Birth_date) - Gestational_age
  )) %>%
  ungroup %>%
  select(-c(Preg_outcome, Child_ID)) %>%
  right_join(df, by = c("Mother_ID", "Lastmentrualperiod", "Birth_date", "Nr_fetuses", "Gestational_age"))

Output

   Mother_ID Lastmentrualperiod Birth_date Nr_fetuses Gestational_age Preg_num Inter_preg_inter Preg_outcome Child_ID
       <int> <chr>              <date>          <int>           <int>    <dbl>            <dbl> <chr>        <chr>   
 1         1 1996-04-15         1996-12-08          1             237        1                0 Livebirth    C1      
 2         2 2018-06-01         2019-02-18          1             262        1                0 Livebirth    C2      
 3         3 1992-02-21         1992-11-22          1             275        1                0 Livebirth    C6      
 4         3 1998-04-22         1999-01-15          2             268        2             1977 LiveBirth    C4      
 5         3 1998-04-22         1999-01-15          2             268        2             1977 Livebirth    C5      
 6         3 2002-08-23         2003-05-07          1             257        3             1316 Livebirth    C3      
 7         4 2003-01-31         2003-11-12          1             285        1                0 Livebirth    C8      
 8         4 2005-01-04         2005-03-18          1              73        2              419 Miscarriage  NA      
 9         4 2006-02-28         2006-11-18          1             263        3              347 Livebirth    C7      
10         5 2009-04-08         2009-06-06          1              59        1                0 Miscarriage  NA      
11         5 2009-08-01         2010-05-02          1             274        2               56 Stillbirth   C9      
12         6 1990-02-08         1990-11-07          1             272        1                0 Livebirth    C11     
13         6 1992-02-02         1992-09-05          1             216        2              452 Stillbirth   NA      
14         6 1995-02-21         1995-11-13          1             265        3              899 Livebirth    C10

Data

df <- structure(list(Mother_ID = c(1L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 
4L, 5L, 5L, 6L, 6L, 6L), Lastmentrualperiod = c("1996-04-15", 
"2018-06-01", "2002-08-23", "1998-04-22", "1998-04-22", "1992-02-21", 
"2006-02-28", "2003-01-31", "2005-01-04", "2009-04-08", "2009-08-01", 
"1992-02-02", "1995-02-21", "1990-02-08"), Birth_date = structure(c(9838, 
17945, 12179, 10606, 10606, 8361, 13470, 12368, 12860, 14401, 
14731, 8283, 9447, 7615), class = "Date"), Nr_fetuses = c(1L, 
1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Preg_outcome = c("Livebirth", 
"Livebirth", "Livebirth", "LiveBirth", "Livebirth", "Livebirth", 
"Livebirth", "Livebirth", "Miscarriage", "Miscarriage", "Stillbirth", 
"Stillbirth", "Livebirth", "Livebirth"), Gestational_age = c(237L, 
262L, 257L, 268L, 268L, 275L, 263L, 285L, 73L, 59L, 274L, 216L, 
265L, 272L), Child_ID = c("C1", "C2", "C3", "C4", "C5", "C6", 
"C7", "C8", NA, NA, "C9", NA, "C10", "C11")), row.names = c(NA, 
-14L), class = "data.frame")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM