I have a bit of annoying coding problem, about which I'd appreciate your assistance.
Here are some sample data:
ID "2013-03-20" "2013-04-09" "2013-04-11" "2013-04-17"
5167f 0 0 0 1
1214m 0 0 0 0
1844f 0 1 1 0
2113m 0 0 1 1
Here's the rub: notice how the column names of the sample data frame comprise dates, and some dates can be close to one another. The ID
column is a simple unique identifier for the observations.
Here are my goals:
(1) I'd like to be able to first group the columns by whether or not they each fall within a 2-week (or 14-day) range (ie that is defined by 2-week increments starting on "2013-03-20"
until "2016-12-20"
); thus, if they do fall within the same range, they'd receive the same identifier (for a new data frame; eg, Period1
for any dates that fall between "2013-03-20"
and "2013-04-03"
).
(2) Once all of the date columns are assigned to a particular period, I then want to summarize the cell data (of 0s and 1s) per period in the following way: if any 1
appears in a particular period for an individual (at all) then that individual would receive a 1
for the whole period (also, if the individual only has 0s for that period, then it would receive a 0
for that single period).
(3) Once this workflow is defined, I then want to group the dates by seasons and years based on their month codes (eg, WinterYYYY
= December, January, February; SpringYYYY
= March, April, May; SummerYYYY
= June, July, August; and FallYYYY
= September, October, November) to produce a new data frame.
In summary, to demonstrate the products manually:
(End Product for Goals 1 and 2; ie for just first two columns in the sample data [date ranges in parentheses are just for a guide])
ID Period1 ("2013-03-20" - "2013-04-03") Period2 ("2013-04-04" - "2013-04-18")
5167f 0 1
1214m 0 0
1844f 0 1
2113m 0 1
(End Product for Goals 2 and 3; ie for all columns in the sample data [month ranges in parentheses are just for a guide])
ID Spring2013 (March - May)
5167f 1
1214m 0
1844f 1
2113m 1
Perhaps, something from the dplyr
package could be useful, but I'm not quite sure.
Thank you for any help, in advance. Please feel free to ask any follow-up questions for clarifications.
-AD-
Please put your data in a tidy format first.
library(dplyr)
data <- gather(data, date, value, -ID )
Then try:
library(lubridate)
data$date <- ymd(data$date)
data <- mutate(data, period = date - as.Date("2013-03-20")) #difference in days
data <- mutate(data, period2 = ceiling(as.numeric(data$period)/14))
data$period2 <- ifelse(data$period2 == 0, 1, data$period2) #change period 0 to period 1
newdat <- data %>%
group_by(ID, period2) %>%
summarise(result = ifelse(sum(value)>0, 1, 0))
Use the spread()
function to change back to the original format.
Here is a solution using functions from tidyverse
.
# Load packages
library(tidyverse)
library(data.table)
library(lubridate)
# Create example data frames
dt <- fread("ID '2013-03-20' '2013-04-09' '2013-04-11' '2013-04-17'
5167f 0 0 0 1
1214m 0 0 0 0
1844f 0 1 1 0
2113m 0 0 1 1")
The key is to prepare a table showing the association between date and grouping variable, such as period, month, or season. In this example, dt_merge
is such table.
dt_merge <- data_frame(
# Create a column showing the beginning date
Date1 = seq(from = ymd("2013-03-20"), to = ymd("2016-12-20"), by = "2 weeks")) %>%
# Create a column showing the end date of each period
mutate(Date2 = lead(Date1)) %>%
# Adjust Date1
mutate(Date1 = if_else(Date1 == ymd("2013-03-20"), Date1, Date1 + 1)) %>%
# Remove the last row
drop_na(Date2) %>%
# Create date list
mutate(Dates = map2(Date1, Date2, function(x, y){ seq(x, y, by = "day") })) %>%
unnest() %>%
# Create Group ID
mutate(RunID = group_indices_(., dots. = c("Date1", "Date2"))) %>%
# Create Period ID
mutate(Period = paste0("Period", RunID)) %>%
# Add a column showing Month
mutate(Month = month(Dates)) %>%
# Add a column showing Year
mutate(Year = year(Dates)) %>%
# Add a column showing season
mutate(Season = case_when(
Month %in% 3:5 ~ "Spring",
Month %in% 6:8 ~ "Summer",
Month %in% 9:11 ~ "Fall",
Month %in% c(12, 1, 2) ~ "Winter",
TRUE ~ NA_character_
)) %>%
# Combine Season and Year
mutate(SeasonYear = paste0(Season, Year)) %>%
select(-Date1, -Date2, -RunID)
After this step, it becomes easy to generate the output you want to. In this example, dt3
is the first end product. dt4
is the second product.
dt2 <- dt %>%
# Reshape the data frame
gather(Date, Value, -ID) %>%
# Convert Date to date class
mutate(Date = ymd(Date)) %>%
# Join dt_merge
left_join(dt_merge, by = c("Date" = "Dates"))
# Product 1
dt3 <- dt2 %>%
group_by(ID, Period) %>%
summarise(Value = max(Value)) %>%
spread(Period, Value)
# Product 2
dt4 <- dt2
group_by(ID, SeasonYear) %>%
summarise(Value = max(Value)) %>%
spread(SeasonYear, Value)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.