I have dataframe where some headers have names a character and dates. I want to select all the columns which dont have dates as header and all the columns which have the dates less than the current date or system date(sys.date()). How can I do thins using select statement in dplyr.
Below is the dataframe
> dput(job_times[1:5,])
structure(list(Skill = c("KAC", "KAC", "KAC", "KAC", "KAC"),
Patch = c("A1", "A2", "A3", "A4", "A5"), `Work Code` = c("W01",
"W01", "W01", "W01", "W01"), Product = c("KAC Repair", "KAC Repair",
"KAC Repair", "KAC Repair", "KAC Repair"), `Visit Time` = c(45.68,
42.55, 46.45, 51.86, 43.49), Travel = c(32.5, 21.66, 26.33,
28.63, 27.03), Success = c(0.69, 0.66, 0.67, 0.65, 0.67),
`Completion Time` = c(1.9, 1.61, 1.8, 2.05, 1.74), `28-12-2020` = c(1.9,
1.61, 1.8, 2.05, 1.74), `04-01-2021` = c(1.9, 1.61, 1.8,
2.05, 1.74), `11-01-2021` = c(1.9, 1.61, 1.8, 2.05, 1.74),
`18-01-2021` = c(1.9, 1.61, 1.8, 2.05, 1.74), `25-01-2021` = c(1.9,
1.61, 1.8, 2.05, 1.74), `01-02-2021` = c(1.9, 1.61, 1.8,
2.05, 1.74), `08-02-2021` = c(1.9, 1.61, 1.8, 2.05, 1.74),
`15-02-2021` = c(1.9, 1.61, 1.8, 2.05, 1.74), `22-02-2021` = c(1.9,
1.61, 1.8, 2.05, 1.74), `01-03-2021` = c(1.9, 1.61, 1.8,
2.05, 1.74), `08-03-2021` = c(1.9, 1.61, 1.8, 2.05, 1.74),
`15-03-2021` = c(1.9, 1.61, 1.8, 2.05, 1.74), `22-03-2021` = c(1.9,
1.61, 1.8, 2.05, 1.74), `29-03-2021` = c(1.9, 1.61, 1.8,
2.05, 1.74), `05-04-2021` = c(1.9, 1.61, 1.8, 2.05, 1.74),
`12-04-2021` = c(1.9, 1.61, 1.8, 2.05, 1.74), `19-04-2021` = c(1.9,
1.61, 1.8, 2.05, 1.74), `26-04-2021` = c(1.9, 1.61, 1.8,
2.05, 1.74), `03-05-2021` = c(1.9, 1.61, 1.8, 2.05, 1.74),
`10-05-2021` = c(1.9, 1.61, 1.8, 2.05, 1.74), `17-05-2021` = c(1.9,
1.61, 1.8, 2.05, 1.74), `24-05-2021` = c(1.9, 1.61, 1.8,
2.05, 1.74), `31-05-2021` = c(1.9, 1.61, 1.8, 2.05, 1.74),
`07-06-2021` = c(1.9, 1.61, 1.8, 2.05, 1.74), `14-06-2021` = c(1.9,
1.61, 1.8, 2.05, 1.74), `21-06-2021` = c(1.9, 1.61, 1.8,
2.05, 1.74), `28-06-2021` = c(1.9, 1.61, 1.8, 2.05, 1.74),
`05-07-2021` = c(1.9, 1.61, 1.8, 2.05, 1.74), `12-07-2021` = c(1.9,
1.61, 1.8, 2.05, 1.74), `19-07-2021` = c(1.9, 1.61, 1.8,
2.05, 1.74), `26-07-2021` = c(1.9, 1.61, 1.8, 2.05, 1.74),
`02-08-2021` = c(1.9, 1.61, 1.8, 2.05, 1.74), `09-08-2021` = c(1.9,
1.61, 1.8, 2.05, 1.74), `16-08-2021` = c(1.9, 1.61, 1.8,
2.05, 1.74), `23-08-2021` = c(1.9, 1.61, 1.8, 2.05, 1.74),
`30-08-2021` = c(1.9, 1.61, 1.8, 2.05, 1.74), `06-09-2021` = c(1.9,
1.61, 1.8, 2.05, 1.74), `13-09-2021` = c(1.9, 1.61, 1.8,
2.05, 1.74), `20-09-2021` = c(1.9, 1.61, 1.8, 2.05, 1.74),
`27-09-2021` = c(1.9, 1.61, 1.8, 2.05, 1.74), `04-10-2021` = c(1.9,
1.61, 1.8, 2.05, 1.74), `11-10-2021` = c(1.9, 1.61, 1.8,
2.05, 1.74), `18-10-2021` = c(1.9, 1.61, 1.8, 2.05, 1.74),
`25-10-2021` = c(1.9, 1.61, 1.8, 2.05, 1.74), `01-11-2021` = c(1.9,
1.61, 1.8, 2.05, 1.74), `08-11-2021` = c(1.9, 1.61, 1.8,
2.05, 1.74), `15-11-2021` = c(1.9, 1.61, 1.8, 2.05, 1.74),
`22-11-2021` = c(1.9, 1.61, 1.8, 2.05, 1.74), `29-11-2021` = c(1.9,
1.61, 1.8, 2.05, 1.74), `06-12-2021` = c(1.9, 1.61, 1.8,
2.05, 1.74), `13-12-2021` = c(1.9, 1.61, 1.8, 2.05, 1.74),
`20-12-2021` = c(1.9, 1.61, 1.8, 2.05, 1.74), `27-12-2021` = c(1.9,
1.61, 1.8, 2.05, 1.74), `03-01-2022` = c(1.9, 1.61, 1.8,
2.05, 1.74), `10-01-2022` = c(1.9, 1.61, 1.8, 2.05, 1.74),
`17-01-2022` = c(1.9, 1.61, 1.8, 2.05, 1.74), `24-01-2022` = c(1.9,
1.61, 1.8, 2.05, 1.74), `31-01-2022` = c(1.9, 1.61, 1.8,
2.05, 1.74), `07-02-2022` = c(1.9, 1.61, 1.8, 2.05, 1.74),
`14-02-2022` = c(1.9, 1.61, 1.8, 2.05, 1.74), `21-02-2022` = c(1.9,
1.61, 1.8, 2.05, 1.74), `28-02-2022` = c(1.9, 1.61, 1.8,
2.05, 1.74), `07-03-2022` = c(1.9, 1.61, 1.8, 2.05, 1.74),
`14-03-2022` = c(1.9, 1.61, 1.8, 2.05, 1.74), `21-03-2022` = c(1.9,
1.61, 1.8, 2.05, 1.74), `28-03-2022` = c(1.9, 1.61, 1.8,
2.05, 1.74), `04-04-2022` = c(1.9, 1.61, 1.8, 2.05, 1.74),
`11-04-2022` = c(1.9, 1.61, 1.8, 2.05, 1.74), `18-04-2022` = c(1.9,
1.61, 1.8, 2.05, 1.74), `25-04-2022` = c(1.9, 1.61, 1.8,
2.05, 1.74), `02-05-2022` = c(1.9, 1.61, 1.8, 2.05, 1.74),
`09-05-2022` = c(1.9, 1.61, 1.8, 2.05, 1.74), `16-05-2022` = c(1.9,
1.61, 1.8, 2.05, 1.74), `23-05-2022` = c(1.9, 1.61, 1.8,
2.05, 1.74), `30-05-2022` = c(1.9, 1.61, 1.8, 2.05, 1.74),
`06-06-2022` = c(1.9, 1.61, 1.8, 2.05, 1.74), `13-06-2022` = c(1.9,
1.61, 1.8, 2.05, 1.74), `20-06-2022` = c(1.9, 1.61, 1.8,
2.05, 1.74), `27-06-2022` = c(1.9, 1.61, 1.8, 2.05, 1.74),
`04-07-2022` = c(1.9, 1.61, 1.8, 2.05, 1.74), `11-07-2022` = c(1.9,
1.61, 1.8, 2.05, 1.74), `18-07-2022` = c(1.9, 1.61, 1.8,
2.05, 1.74), `25-07-2022` = c(1.9, 1.61, 1.8, 2.05, 1.74),
`01-08-2022` = c(1.9, 1.61, 1.8, 2.05, 1.74), `08-08-2022` = c(1.9,
1.61, 1.8, 2.05, 1.74), `15-08-2022` = c(1.9, 1.61, 1.8,
2.05, 1.74), `22-08-2022` = c(1.9, 1.61, 1.8, 2.05, 1.74),
`29-08-2022` = c(1.9, 1.61, 1.8, 2.05, 1.74), `05-09-2022` = c(1.9,
1.61, 1.8, 2.05, 1.74), `12-09-2022` = c(1.9, 1.61, 1.8,
2.05, 1.74), `19-09-2022` = c(1.9, 1.61, 1.8, 2.05, 1.74),
`26-09-2022` = c(1.9, 1.61, 1.8, 2.05, 1.74), `03-10-2022` = c(1.9,
1.61, 1.8, 2.05, 1.74), `10-10-2022` = c(1.9, 1.61, 1.8,
2.05, 1.74), `17-10-2022` = c(1.9, 1.61, 1.8, 2.05, 1.74),
`24-10-2022` = c(1.9, 1.61, 1.8, 2.05, 1.74), `31-10-2022` = c(1.9,
1.61, 1.8, 2.05, 1.74), `07-11-2022` = c(1.9, 1.61, 1.8,
2.05, 1.74), `14-11-2022` = c(1.9, 1.61, 1.8, 2.05, 1.74),
`21-11-2022` = c(1.9, 1.61, 1.8, 2.05, 1.74), `28-11-2022` = c(1.9,
1.61, 1.8, 2.05, 1.74), `05-12-2022` = c(1.9, 1.61, 1.8,
2.05, 1.74), `12-12-2022` = c(1.9, 1.61, 1.8, 2.05, 1.74),
`19-12-2022` = c(1.9, 1.61, 1.8, 2.05, 1.74), `26-12-2022` = c(1.9,
1.61, 1.8, 2.05, 1.74)), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"))
I want the Skill, Patch, Work Code, Product, Visit Time, Travel, Success, Completion Time columns along with all the columns which have their dates less than or equal to sys.Date(). Using dplyr and select statements.
This is how I would solve it -
cols <- grep('\\d{2}-\\d{2}-\\d{2}', names(job_times), value = TRUE)
result <- job_times[, c(setdiff(names(job_times), cols),
cols[Sys.Date() > as.Date(cols, '%d-%m-%Y')])]
You can integrate this in dplyr
pipe as -
library(dplyr)
job_times %>%
select({
cols <- grep('\\d{2}-\\d{2}-\\d{2}', names(.), value = TRUE)
c(setdiff(names(.), cols),
cols[Sys.Date() > as.Date(cols, '%d-%m-%Y')])
})
I would suggest creating a helper function and then you can use select like this:
library(tidyverse)
library(lubridate)
is_before_today <- function(x) {
(dmy(x, quiet = TRUE) < Sys.Date()) %>% coalesce(FALSE)
}
df %>%
select(
matches("^\\D"), all_of(colnames(.) %>% keep(is_before_today))
)
#> # A tibble: 5 x 38
#> Skill Patch `Work Code` Product `Visit Time` Travel Success `Completion Tim~
#> <chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 KAC A1 W01 KAC Repa~ 45.7 32.5 0.69 1.9
#> 2 KAC A2 W01 KAC Repa~ 42.6 21.7 0.66 1.61
#> 3 KAC A3 W01 KAC Repa~ 46.4 26.3 0.67 1.8
#> 4 KAC A4 W01 KAC Repa~ 51.9 28.6 0.65 2.05
#> 5 KAC A5 W01 KAC Repa~ 43.5 27.0 0.67 1.74
#> # ... with 30 more variables: 28-12-2020 <dbl>, 04-01-2021 <dbl>,
#> # 11-01-2021 <dbl>, 18-01-2021 <dbl>, 25-01-2021 <dbl>, 01-02-2021 <dbl>,
#> # 08-02-2021 <dbl>, 15-02-2021 <dbl>, 22-02-2021 <dbl>, 01-03-2021 <dbl>,
#> # 08-03-2021 <dbl>, 15-03-2021 <dbl>, 22-03-2021 <dbl>, 29-03-2021 <dbl>,
#> # 05-04-2021 <dbl>, 12-04-2021 <dbl>, 19-04-2021 <dbl>, 26-04-2021 <dbl>,
#> # 03-05-2021 <dbl>, 10-05-2021 <dbl>, 17-05-2021 <dbl>, 24-05-2021 <dbl>,
#> # 31-05-2021 <dbl>, 07-06-2021 <dbl>, 14-06-2021 <dbl>, 21-06-2021 <dbl>,
#> # 28-06-2021 <dbl>, 05-07-2021 <dbl>, 12-07-2021 <dbl>, 19-07-2021 <dbl>
Created on 2021-07-20 by the reprex package (v1.0.0)
Ronak Shah's answer is extremely great. Here is how I'll do.
## Get the List of All Column Names
ColumnNames <- names(TestDF)
## Retain Only those don't have Dates
CharacterColumnNames <- ColumnNames[grepl( "[[:alpha:]]" , names( TestDF ) )]
## Get the List of all Date Column Names
DateColumns <- setdiff(names(TestDF),CharacterColumnNames)
## Filter Required Date Column Names
RequiredDateColumns <- DateColumns[ Sys.Date() > as.Date(DateColumns, '%d-%m-%Y')]
## Get the Modified DF
ModifiedDF <- TestDF[, c(CharacterColumnNames, RequiredDateColumns)]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.