简体   繁体   中英

Finding if the values in one column are within the range of several other columns

I am looking for an easy way to find whether the value in a column lies within the range of values in other columns.

My input looks like this:

ID  "Q1 Comm - 01 Scope Thesis" "Q1 Comm - 02 Scope Project" "Q1 Comm - 03 Learn Intern"    "Q1 Comm - 04 Biography"    "Q1 Comm - Overall Plan"
10   NA                          NA                           4                              NA      4
31   2                           NA                           NA                             NA      2
225  0                           NA                           NA                             NA      1
243  NA                          2                            NA                             1       0
310  NA                          2                            NA                             1       NA

For each unique ID , I am interested in identifying when the column Q1 Comm - Overall Plan is:

1 - Below the min() of all the other columns, or

2 - Above the max() of all the other columns, or

3 - Within the range of all the other columns

The complete list of columns (along with the overall column) is below:

"Q1 Comm - 01 Scope Thesis"
"Q1 Comm - 02 Scope Project"
"Q1 Comm - 03 Learn Intern"
"Q1 Comm - 04 Biography"
"Q1 Comm - 05 Exhibit"
"Q1 Comm - 06 Social Act"
"Q1 Comm - 07 Post Project"
"Q1 Comm - 08 Learn Plant"
"Q1 Comm - 09 Study Narrate"
"Q1 Comm - 10 Learn Participate"
"Q1 Comm - 11 Write 1"
"Q1 Comm - 12 Read 2"
"Q1 Comm - Overall Plan"

My required output is something like this:

ID  "Q1 Comm - 01 Scope Thesis" "Q1 Comm - 02 Scope Project" "Q1 Comm - 03 Learn Intern"    "Q1 Comm - 04 Biography"    "Q1 Comm - Overall Plan" "Q1_check"
10   NA                          NA                           4                              NA      4 "within"
31   2                           NA                           NA                             NA      2 "within"
225  0                           NA                           NA                             NA      1 "above"
243  NA                          2                            NA                             1       0 "below"
310  NA                          2                            NA                             1       NA NA

The dput() for my data frame df is below.

dput(df)

structure(list(ID = c(10L, 31L, 225L, 243L), Q1.Comm...01.Scope.Thesis = c(NA, 
2L, 0L, NA), Q1.Comm...02.Scope.Project = c(NA, NA, NA, 2L), 
    Q1.Comm...03.Learn.Intern = c(4L, NA, NA, NA), Q1.Comm...04.Biography = c(NA, 
    NA, NA, 1L), Q1.Comm...Overall.Plan = c(4L, 1L, 2L, 
    NA), X = c(NA, NA, NA, NA), X.1 = c(NA, NA, NA, NA), X.2 = c(NA, 
    NA, NA, NA)), class = "data.frame", row.names = c(NA, -4L
))

Note:

I had asked this question here Finding if a value is within the range of other columns but the example was too simplified and none of the solutions worked for me.

The question was getting too lengthy, therefore, for the sake of clarity, I am posting this as a new question.

I thank you for your time and help on this post.

You might try something like this with rowwise and c_across :

library(dplyr)
df %>%
  rowwise %>%
  summarise(ID = ID,
            Max = `Q1.Comm...Overall.Plan` > max(c_across(-c(ID,`Q1.Comm...Overall.Plan`)),na.rm = TRUE),
            Min = `Q1.Comm...Overall.Plan` < min(c_across(-c(ID,`Q1.Comm...Overall.Plan`)),na.rm = TRUE),
            Range = `Q1.Comm...Overall.Plan` >= range(c_across(-c(ID,`Q1.Comm...Overall.Plan`)),na.rm = TRUE)[1] &
                    `Q1.Comm...Overall.Plan` <= range(c_across(-c(ID,`Q1.Comm...Overall.Plan`)),na.rm = TRUE)[2]) %>%
  mutate(Result = case_when(Max ~ "above",
                            Min ~ "below",
                            Range ~ "within",
                            TRUE ~ NA_character_))
# A tibble: 4 x 5
     ID Max   Min   Range Result
  <int> <lgl> <lgl> <lgl> <chr> 
1    10 FALSE FALSE TRUE  within
2    31 FALSE FALSE TRUE  within
3   225 TRUE  FALSE FALSE above 
4   243 NA    NA    NA    NA    

You can change summarise to mutate to keep the original columns and/or select to get rid of them.

See the dplyr rowwise tutorial for more information.

library(purrr)
library(data.table)

needed_cols <- setdiff(names(df), c("ID", "Q1.Comm...Overall.Plan"))

setDT(df)[, c("min", "max") := transpose(pmap(.SD, range, na.rm = TRUE)), .SDcols = needed_cols]
df[, Q1_check := fcase(
    is.na(`Q1.Comm...Overall.Plan`), NA_character_,
    `Q1.Comm...Overall.Plan` < min, "below",
    `Q1.Comm...Overall.Plan` > max, "above",
    default = "within"
  )
]
df[, c("max", "min") := NULL]

I have revised your dput to suit to your requirements as discussed in the linked question. I think this will help you. I used janitor::clean_names() which I suggest you to use before proceeding so that your column names are cleaned.

So the revised dput is

df <- structure(list(id = c(10L, 31L, 225L, 243L), q1_comm_01_scope_thesis = c(NA, 
2L, 0L, NA), q1_comm_02_scope_project = c(NA, NA, NA, 2L), q1_comm_03_learn_intern = c(4L, 
NA, NA, NA), q1_comm_04_biography = c(NA, NA, NA, 1L), q1_comm_overall_plan = c(4L, 
1L, 2L, NA), q2_comm_01_scope_thesis = c(NA, 4, 0, NA), q2_comm_02_scope_project = c(NA, 
NA, NA, 4), q2_comm_03_learn_intern = c(8, NA, NA, NA), q2_comm_04_biography = c(NA, 
NA, NA, 2), q2_comm_overall_plan = c(8, 2, 4, NA)), row.names = c(NA, 
-4L), class = "data.frame")

df
   id q1_comm_01_scope_thesis q1_comm_02_scope_project q1_comm_03_learn_intern q1_comm_04_biography q1_comm_overall_plan q2_comm_01_scope_thesis
1  10                      NA                       NA                       4                   NA                    4                      NA
2  31                       2                       NA                      NA                   NA                    1                       4
3 225                       0                       NA                      NA                   NA                    2                       0
4 243                      NA                        2                      NA                    1                   NA                      NA
  q2_comm_02_scope_project q2_comm_03_learn_intern q2_comm_04_biography q2_comm_overall_plan
1                       NA                       8                   NA                    8
2                       NA                      NA                   NA                    2
3                       NA                      NA                   NA                    4
4                        4                      NA                    2                   NA

Now proceed as suggested. You have to modify [-5] inside cur_data() to suit to your requirements (as per the relative location of overall_column I think 9 in your case)

library(tidyverse)

split.default(df[-1], gsub('(q\\d*)(.*)', '\\1', names(df[-1]), perl = T)) %>%
  map(., ~ .x %>% bind_cols('id' = df$id) %>%
        group_by(id) %>%
        mutate(across(ends_with('_overall_plan'), ~ case_when(. < min(cur_data()[-5], na.rm = T) ~ 'below',
                                                              . > max(cur_data()[-5], na.rm = T) ~ 'above',
                                                              is.na(.) ~ NA_character_,
                                                              TRUE ~ 'within'),
                      .names = '{str_remove(.col,"_comm_overall_plan")}_check'))
        ) %>%
  reduce(left_join, by = 'id')

# A tibble: 4 x 13
# Groups:   id [4]
  q1_comm_01_scop~ q1_comm_02_scop~ q1_comm_03_lear~ q1_comm_04_biog~ q1_comm_overall~    id q1_check q2_comm_01_scop~ q2_comm_02_scop~ q2_comm_03_lear~ q2_comm_04_biog~
             <int>            <int>            <int>            <int>            <int> <int> <chr>               <dbl>            <dbl>            <dbl>            <dbl>
1               NA               NA                4               NA                4    10 within                 NA               NA                8               NA
2                2               NA               NA               NA                1    31 below                   4               NA               NA               NA
3                0               NA               NA               NA                2   225 above                   0               NA               NA               NA
4               NA                2               NA                1               NA   243 NA                     NA                4               NA                2
# ... with 2 more variables: q2_comm_overall_plan <dbl>, q2_check <chr>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM