简体   繁体   中英

Adding a column to my dataframe based on other columns

I wish to add a new column to my dataframe called weekend which contains TRUE or FASLE if the day is in the weekend or not. This can be based off the date or weekday column. How would i do this?

data snipet

      Substation  Date       cluster  Time     Value  weekday
1     511016      2013-01-17       1 00:00 0.6215941 Thursday
2     511029      2013-01-17       1 00:00 0.5677445 Thursday
3     511030      2013-01-17       1 00:00 0.6065458 Thursday
4     511033      2013-01-08       2 00:00 0.3090885  Tuesday
5     511034      2013-01-17       1 00:00 0.5263230 Thursday
6     511035      2013-01-19       1 00:00 0.5267718 Saturday

Something like this? where you replace data with your dataframe

library(tidyverse)
data %>% mutate(weekend = ifelse(weekday %in% c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday"), F, T))

or

library(tidyverse)
data %>% mutate(weekend = ifelse(weekday %in% c("Saturday", "Sunday"), T, F))

This can be informed by ?strptime (which includes % -codes for format(..) ):

     '%u' Weekday as a decimal number (1-7, Monday is 1).

Prep, in case you don't have Date objects in the dat$Date column:

dat$Date <- as.Date(dat$Date)

Code:

dat$is_weekend <- format(dat$Date, "%u") %in% 6:7
dat
#   Substation       Date cluster  Time     Value  weekday is_weekend
# 1     511016 2013-01-17       1 00:00 0.6215941 Thursday      FALSE
# 2     511029 2013-01-17       1 00:00 0.5677445 Thursday      FALSE
# 3     511030 2013-01-17       1 00:00 0.6065458 Thursday      FALSE
# 4     511033 2013-01-08       2 00:00 0.3090885  Tuesday      FALSE
# 5     511034 2013-01-17       1 00:00 0.5263230 Thursday      FALSE
# 6     511035 2013-01-19       1 00:00 0.5267718 Saturday       TRUE

Data

dat <- structure(list(Substation = c(511016L, 511029L, 511030L, 511033L, 511034L, 511035L), Date = structure(c(15722, 15722, 15722, 15713, 15722, 15724), class = "Date"), cluster = c(1L, 1L, 1L, 2L, 1L, 1L), Time = c("00:00", "00:00", "00:00", "00:00", "00:00", "00:00"), Value = c(0.6215941, 0.5677445, 0.6065458, 0.3090885, 0.526323, 0.5267718), weekday = c("Thursday", "Thursday", "Thursday", "Tuesday", "Thursday", "Saturday"), is_weekend = c(FALSE, FALSE, FALSE, FALSE, FALSE, TRUE)), row.names = c("1",  "2", "3", "4", "5", "6"), class = "data.frame")

Another possible solution, based on lubridate::wday :

library(tidyverse)
library(lubridate)

df %>% 
  mutate(weekend = wday(ymd(Date)) %in% c(1,7))

#>   Substation       Date cluster  Time     Value  weekday weekend
#> 1     511016 2013-01-17       1 00:00 0.6215941 Thursday   FALSE
#> 2     511029 2013-01-17       1 00:00 0.5677445 Thursday   FALSE
#> 3     511030 2013-01-17       1 00:00 0.6065458 Thursday   FALSE
#> 4     511033 2013-01-08       2 00:00 0.3090885  Tuesday   FALSE
#> 5     511034 2013-01-17       1 00:00 0.5263230 Thursday   FALSE
#> 6     511035 2013-01-19       1 00:00 0.5267718 Saturday    TRUE

Here is another possible solution using isWeekend from timeDate :

library(timeDate)

df$is_weekend <- isWeekend(as.Date(df$Date), wday = 1:5)

Output

  Substation       Date cluster  Time     Value  weekday is_weekend
1     511016 2013-01-17       1 00:00 0.6215941 Thursday      FALSE
2     511029 2013-01-17       1 00:00 0.5677445 Thursday      FALSE
3     511030 2013-01-17       1 00:00 0.6065458 Thursday      FALSE
4     511033 2013-01-08       2 00:00 0.3090885  Tuesday      FALSE
5     511034 2013-01-17       1 00:00 0.5263230 Thursday      FALSE
6     511035 2013-01-19       1 00:00 0.5267718 Saturday       TRUE

Data

df <- structure(list(Substation = c(511016L, 511029L, 511030L, 511033L, 
511034L, 511035L), Date = c("2013-01-17", "2013-01-17", "2013-01-17", 
"2013-01-08", "2013-01-17", "2013-01-19"), cluster = c(1L, 1L, 
1L, 2L, 1L, 1L), Time = c("00:00", "00:00", "00:00", "00:00", 
"00:00", "00:00"), Value = c(0.6215941, 0.5677445, 0.6065458, 
0.3090885, 0.526323, 0.5267718), weekday = c("Thursday", "Thursday", 
"Thursday", "Tuesday", "Thursday", "Saturday"), is_weekend = c(FALSE, 
FALSE, FALSE, FALSE, FALSE, TRUE)), row.names = c(NA, -6L), class = "data.frame")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM