I am trying to calculate pregnancy interval between births for each mother in my data set, using mother ID, in R.
This would be taking into account that a mother could have multiple births (ie 2 births or 10 births), however, some of the pregnancies could have ended in miscarriage or stillbirth. In addition, some of the pregnancies could be multiple (ie twins, triplets), as is the case of mother with ID 3.
Below is an example of the dataset.
Mother_ID | Last mentrual period | Birth_date | Nr_fetuses | Preg_outcome | Gestational_age | Child_ID |
---|---|---|---|---|---|---|
1 | 1996-04-15 | 1996-12-08 | 1 | Livebirth | 237 | C1 |
2 | 2018-06-01 | 2019-02-18 | 1 | Livebirth | 262 | C2 |
3 | 2002-08-23 | 2003-05-07 | 1 | Livebirth | 257 | C3 |
3 | 1998-04-22 | 1999-01-15 | 2 | LiveBirth | 268 | C4 |
3 | 1998-04-22 | 1999-01-15 | 2 | Livebirth | 268 | C5 |
3 | 1992-02-21 | 1992-11-22 | 1 | Livebirth | 275 | C6 |
4 | 2006-02-28 | 2006-11-18 | 1 | Livebirth | 263 | C7 |
4 | 2003-01-31 | 2003-11-12 | 1 | Livebirth | 285 | C8 |
4 | 2005-01-04 | 2005-03-18 | 1 | Miscarriage | 73 | |
5 | 2009-04-08 | 2009-06-06 | 1 | Miscarriage | 59 | |
5 | 2009-08-01 | 2010-05-02 | 1 | Stillbirth | 274 | C9 |
6 | 1992-02-02 | 1992-09-05 | 1 | Stillbirth | 216 | |
6 | 1995-02-21 | 1995-11-13 | 1 | Livebirth | 265 | C10 |
6 | 1990-02-08 | 1990-11-07 | 1 | Livebirth | 272 | C11 |
The outpout would show the pregnancy intervals, in days, ordered by date of birth for each mother. Please could you suggest ways of achieving this, ideally with base R.
The output would be something like this:
Line | Mother_ID | Last mentrual period | Birth_date | Nr_fetuses | Preg_outcome | Gestational_age | Child_ID | Inter_preg_inter (days) |
---|---|---|---|---|---|---|---|---|
1 | 1 | 1996-04-15 | 1996-12-08 | 1 | Livebirth | 237 | C1 | 0 |
2 | 2 | 2018-06-01 | 2019-02-18 | 1 | Livebirth | 262 | C2 | 0 |
3 | 3 | 1992-02-21 | 1992-11-22 | 1 | Livebirth | 275 | C6 | 0 |
4 | 3 | 1998-04-22 | 1999-01-15 | 2 | LiveBirth | 268 | C4 | 1977 |
5 | 3 | 1998-04-22 | 1999-01-15 | 2 | Livebirth | 268 | C5 | 1977 |
6 | 3 | 2002-08-23 | 2003-05-07 | 1 | Livebirth | 257 | C3 | 1316 |
7 | 4 | 2003-01-31 | 2003-11-12 | 1 | Livebirth | 285 | C8 | 0 |
8 | 4 | 2005-01-04 | 2005-03-18 | 1 | Miscarriage | 73 | 419 | |
9 | 4 | 2006-02-28 | 2006-11-18 | 1 | Livebirth | 263 | C7 | 347 |
10 | 5 | 2009-04-08 | 2009-06-06 | 1 | Miscarriage | 59 | 0 | |
11 | 5 | 2009-08-01 | 2010-05-02 | 1 | Stillbirth | 274 | C9 | 56 |
12 | 6 | 1990-02-08 | 1990-11-07 | 1 | Livebirth | 272 | C11 | 0 |
13 | 6 | 1992-02-02 | 1992-09-05 | 1 | Stillbirth | 216 | 452 | |
14 | 6 | 1995-02-21 | 1995-11-13 | 1 | Livebirth | 265 | C10 | 899 |
So: Pregnancy interval =Date of the last birth(line4) - Date of most recent birth (line3) - gestational age (ie Mother_ID 3--->1999-01-15 - 1992-11-22 - 268 =1977 days)
OR
Pregnancy interval =Date of the last menstrual period(line4) - Date of most recent birth (line3) (ie Mother_ID 3 ---> 1999-01-15 - 1992-11-22 =1977 days).
Sorted by mother ID and order of birth.
This might help you moving forward - though there may be certain circumstances you'd want to modify further (eg, twins where only old child survives, or multiple births spread out on consecutive dates past midnight, etc.).
First, you can sort your data by Birth_date
and group by Mother_ID
. You can create a pregnancy number to count pregnancies and allow for grouping, where the same "pregnancy" would be when the Birth_date
is not more than 1 day apart.
Then, grouping by both Mother_ID
and this new pregnancy number Preg_num
, keep only one row of data (ignore that one or more children would be omitted, for now). After that, group again by Mother_ID
and calculate intervals between pregnancies.
Finally, you can right_join
back to the original data.
library(tidyverse)
df$Birth_date <- as.Date(df$Birth_date)
df %>%
arrange(Mother_ID, Birth_date) %>%
group_by(Mother_ID) %>%
mutate(Preg_num = cumsum(Birth_date - lag(Birth_date, default = first(Birth_date)) > 1) + 1) %>%
group_by(Mother_ID, Preg_num) %>%
slice(1) %>%
group_by(Mother_ID) %>%
mutate(Inter_preg_inter = ifelse(
Preg_num == 1,
0,
Birth_date - lag(Birth_date) - Gestational_age
)) %>%
ungroup %>%
select(-c(Preg_outcome, Child_ID)) %>%
right_join(df, by = c("Mother_ID", "Lastmentrualperiod", "Birth_date", "Nr_fetuses", "Gestational_age"))
Output
Mother_ID Lastmentrualperiod Birth_date Nr_fetuses Gestational_age Preg_num Inter_preg_inter Preg_outcome Child_ID
<int> <chr> <date> <int> <int> <dbl> <dbl> <chr> <chr>
1 1 1996-04-15 1996-12-08 1 237 1 0 Livebirth C1
2 2 2018-06-01 2019-02-18 1 262 1 0 Livebirth C2
3 3 1992-02-21 1992-11-22 1 275 1 0 Livebirth C6
4 3 1998-04-22 1999-01-15 2 268 2 1977 LiveBirth C4
5 3 1998-04-22 1999-01-15 2 268 2 1977 Livebirth C5
6 3 2002-08-23 2003-05-07 1 257 3 1316 Livebirth C3
7 4 2003-01-31 2003-11-12 1 285 1 0 Livebirth C8
8 4 2005-01-04 2005-03-18 1 73 2 419 Miscarriage NA
9 4 2006-02-28 2006-11-18 1 263 3 347 Livebirth C7
10 5 2009-04-08 2009-06-06 1 59 1 0 Miscarriage NA
11 5 2009-08-01 2010-05-02 1 274 2 56 Stillbirth C9
12 6 1990-02-08 1990-11-07 1 272 1 0 Livebirth C11
13 6 1992-02-02 1992-09-05 1 216 2 452 Stillbirth NA
14 6 1995-02-21 1995-11-13 1 265 3 899 Livebirth C10
Data
df <- structure(list(Mother_ID = c(1L, 2L, 3L, 3L, 3L, 3L, 4L, 4L,
4L, 5L, 5L, 6L, 6L, 6L), Lastmentrualperiod = c("1996-04-15",
"2018-06-01", "2002-08-23", "1998-04-22", "1998-04-22", "1992-02-21",
"2006-02-28", "2003-01-31", "2005-01-04", "2009-04-08", "2009-08-01",
"1992-02-02", "1995-02-21", "1990-02-08"), Birth_date = structure(c(9838,
17945, 12179, 10606, 10606, 8361, 13470, 12368, 12860, 14401,
14731, 8283, 9447, 7615), class = "Date"), Nr_fetuses = c(1L,
1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), Preg_outcome = c("Livebirth",
"Livebirth", "Livebirth", "LiveBirth", "Livebirth", "Livebirth",
"Livebirth", "Livebirth", "Miscarriage", "Miscarriage", "Stillbirth",
"Stillbirth", "Livebirth", "Livebirth"), Gestational_age = c(237L,
262L, 257L, 268L, 268L, 275L, 263L, 285L, 73L, 59L, 274L, 216L,
265L, 272L), Child_ID = c("C1", "C2", "C3", "C4", "C5", "C6",
"C7", "C8", NA, NA, "C9", NA, "C10", "C11")), row.names = c(NA,
-14L), class = "data.frame")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.