简体   繁体   中英

Assigning values to a column in the based on values of another column in the same dataframe in R

I have a dataframe with 3 columns and I want to assign values to a fourth column of this dataframe if the sum of a condition is met in another row. In this example I want to assign 1 to df[,4], if df[,3]>=2 for each row.

An example of what I want as the output is:

在此处输入图像描述

Any help is appreciated.

Thank you,

Do you want to assign 1 if both time1 and time2 are 1?

If there are only two columns you can do -

df$label <- as.integer(df$time1 == 1 & df$time2 == 1)

If there are many such time columns we can take help of rowSums -

cols <- grep('time', names(df))
df$label <- as.integer(rowSums(df[cols] == 1) == length(cols))
df

#  a time1 time2 label
#1 a     1     1     1
#2 b     1     0     0
#3 c     1     1     1
#4 d     0     1     0
#5 e     0     0     0

data

Images are not the right way to share data, provide them in a reproducible format.

df <- data.frame(a = letters[1:5], 
                 time1 = c(1, 1, 1, 0, 0), 
                 time2 = c(1, 0, 1, 1, 0))
library(tidyverse)

data <- 
    tribble(
        ~ID, ~time1, ~time2,
        'jkjkdf', 1, 1,
        'kjkj', 1, 0,
        'fgf', 1, 1,
        'jhkj', 0, 1, 
        'hgd', 0,0
    )

mutate(data, label = if_else(time1 + time2 >= 2, 1, 0))
#> # A tibble: 5 x 4
#>   ID     time1 time2 label
#>   <chr>  <dbl> <dbl> <dbl>
#> 1 jkjkdf     1     1     1
#> 2 kjkj       1     0     0
#> 3 fgf        1     1     1
#> 4 jhkj       0     1     0
#> 5 hgd        0     0     0

#or with n time columns

data %>%
    rowwise() %>% 
    mutate(label = if_else(sum(across(starts_with('time'))) >= 2, 1, 0))
#> # A tibble: 5 x 4
#> # Rowwise: 
#>   ID     time1 time2 label
#>   <chr>  <dbl> <dbl> <dbl>
#> 1 jkjkdf     1     1     1
#> 2 kjkj       1     0     0
#> 3 fgf        1     1     1
#> 4 jhkj       0     1     0
#> 5 hgd        0     0     0

Created on 2021-06-06 by the reprex package (v2.0.0)

We could do thin in a vectorized way using tidyverse methods - select the columns that starts_with 'time' in column name, reduce it to a single vector by adding ( + ) the corresponding elements, use the aliases from magrittr to convert it to binary for creating the 'label' column. Finally, the object should be assigned ( <- ) to original data if we want the original object to be changed

library(dplyr)
library(purrr)
library(magrittr)
df %>%
    mutate(label = select(cur_data(), starts_with('time')) %>%
               reduce(`+`) %>% 
               is_weakly_greater_than(2) %>% 
               multiply_by(1))
  a time1 time2 label
1 a     1     1     1
2 b     1     0     0
3 c     1     1     1
4 d     0     1     0
5 e     0     0     0

data

df <- structure(list(a = c("a", "b", "c", "d", "e"), time1 = c(1, 1, 
1, 0, 0), time2 = c(1, 0, 1, 1, 0)), class = "data.frame", row.names = c(NA, 
-5L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM