简体   繁体   中英

Create new column based on another variable

I have a dataframe with several columns. One of them is the column participant , where different participant codes are listed. These are all either in the 100 range, the 200 range or the 500 range. For example: 101, 203, 209, 504, 103, 512 and so on.

I want to create an extra column in the dataframe called group with 3 possible values: 100 , 200 and 500 . Thus, depending on the number a participant code starts with, it will be assigned one of these 3 labels.

I have tried using a combination of startsWith() and ifelse statements, but I can't make it work.

data$group = ifelse(startsWith(as.character(data$participant), "1"), "100", 
                    ((ifelse(startsWith(as.character(data$participant), "2"), "200",
                           (ifelse(startsWith(as.character(data$participant), "5"), "500")), NULL)))

simple tidyverse solution (similar to s__ soluiton.)

tibble(
participant = c(101, 203, 209, 504, 103, 512),
group = round(participant, -2)
)

# A tibble: 6 x 2
  participant group
        <dbl> <dbl>
1         101   100
2         203   200
3         209   200
4         504   500
5         103   100
6         512   500

Based on your examples and comments it looks like you want to divide a numeric value into ranges and assign a character label.

case_when provides a straightforward option. It takes longer to type, but it may be more readable for people unfamiliar with cut or more mathematical approaches.

tibble(old = c(101, 203, 209, 504, 103, 512)) %>%
    mutate(
        new = case_when(
            old < 100 ~ NA_character_,
            old < 200 ~ "100",
            old < 300 ~ "200",
            old < 400 ~ "300",
            old < 500 ~ "400",
            old < 600 ~ "500",
            TRUE ~ NA_character_
        )
    )

Result

# A tibble: 6 x 2
    old new  
  <dbl> <chr>
1   101 100  
2   203 200  
3   209 200  
4   504 500  
5   103 100  
6   512 500 

That said, the cut function was designed to do precisely what you described, and has an option to specify the output labels.

old <- c(101, 203, 209, 504, 103, 512)

new <- cut(
    x = old, 
    breaks = seq(from = 100, to = 600, by = 100), 
    labels = seq(from = 100, to = 500, by = 100)
)

as.character(new)

Result

[1] "100" "200" "200" "500" "100" "500"

May be this can be done more easily

(data$participant %/% 100) * 100
#[1] 100 200 200 500 100 500

In the OP's code, the last 'no' should be NA_character_ and not NULL as NULL returns with a length of 0. eg

 v1 <- c(10, 20, 5, 2, 40)
 ifelse(v1 > 50, 3, NULL)

Error in ans[npos] <- rep(no, length.out = len)[npos]: replacement has length zero In addition: Warning message: In rep(no, length.out = len): 'x' is NULL so the result will be NULL

ifelse(v1 > 50, 3, NA)
#[1] NA NA NA NA NA

data

data <- structure(list(participant = c(101, 203, 209, 504, 103, 512)), 
     class = "data.frame", row.names = c(NA, -6L))

You can manage it also with round() :

x <- c(101, 203, 209, 504, 103, 512)
round(x, -2)
[1] 100 200 200 500 100 500

In you case:

data$group <- round(data$participant, -2)

Using ifelse :

data$group <- ifelse(data$participant > 100 & data$participant <= 200, 100,
                     ifelse(data$participant > 200 & data$participant <= 300, 200, 500))

Result:

data
  participant group
1         101   100
2         203   200
3         209   200
4         504   500
5         103   100
6         512   500

Another option in data.table you can try

library(data.table)
df <- data.table(participants=c(101, 203, 209, 504, 103, 512))
df[,groups:= (participants - participants%%100)]
   participants groups
1:          101    100
2:          203    200
3:          209    200
4:          504    500
5:          103    100
6:          512    500

Not exactly your answer but you can use cut function too, for instance, in data.table it may look like this:

library(data.table)

df <- data.table(participants = c(101, 203, 209, 504, 103, 512))
df[, groups:=cut(participants, seq(100,600,100))]

   participants    groups
1:          101 (100,200]
2:          203 (200,300]
3:          209 (200,300]
4:          504 (500,600]
5:          103 (100,200]
6:          512 (500,600]

It's rather verbose but it's just another way:

library(dplyr)

participant <- c(101, 203, 209, 504, 103, 512)

df <- tibble(participant)

df %>%
  mutate(group = case_when(
    participant %in% 100:199 ~ 100,
    participant %in% 200:299 ~ 200,
    participant %in% 500:599 ~ 500
  ))

# A tibble: 6 x 2
  participant group
        <dbl> <dbl>
1         101   100
2         203   200
3         209   200
4         504   500
5         103   100
6         512   500

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM