简体   繁体   中英

How to create a variable conditional on a string in R

I am trying to create the variable 'active' in a df conditional of values from a string in R (act_users). If the name from the variables scr_name and rt_name in the df is within the string, I would like the variable to take the value 1, if not 0.

df <- data.frame("screen_name" = c("august", "berit", "christopher", "david", "erica", "frank"), "rt_name" = c("berit", "august", "david", "erica", "frank", "christopher"))

act_users <- c("david", "august", "berit")

I have tried the following if else statements, but none of them work

'%!in%' <- function(x,y)!('%in%'(x,y))#create a function 

df$active <- ifelse((df$screen_name %in% act_users) & (df$rt_name %in% act_users), 1, 
                         ifelse((df$screen_name %!in% act_users) & (df$rt_name %!in% act_users), 2))

#attempts only with screenname
df$active <- ifelse(df$screen_name %in% act_users, "1", ifelse(df$screen_name %!in% act_users, "0"))


df$active <- if(df$screen_name %in% act_users){
  df$active == 1
} else {
  df$active == 0}

My last solution would be to make the active user string as a df, merge the results and match the colomns inside the dataframe, but my data is quite big, so it would be nice with a more efficient solution?

Thanks in adcvance!

If it is an exact match you can use:

df$active = apply(df,1,function(i)as.numeric(all(i %in% act_users)))

You take every row, and return a True / False whether each column is an element of act_users. All will give you 1 only if all booleans are true.

  screen_name     rt_name active
1      august       berit      1
2       berit      august      1
3 christopher       david      0
4       david       erica      0
5       erica       frank      0
6       frank christopher      0

Maybe you can use the code below to make it, which could be faster than apply(df,1,...) when you have many rows:

  • Solution 1:
df$active <- Reduce("*",lapply(df, function(x) ifelse(x %in% act_users,1,0)))
  • Solution 2:
df <- within(df, active <- ifelse(screen_name%in%act_users & rt_name%in%act_users,1,0))

Output

> df
  screen_name     rt_name active
1      august       berit      1
2       berit      august      1
3 christopher       david      0
4       david       erica      0
5       erica       frank      0
6       frank christopher      0

If you want to check only two columns, you can use %in% on both the columns and combine the result.

df$active <- +(df$screen_name %in% act_users & df$rt_name %in% act_users)
df

#  screen_name     rt_name active
#1      august       berit      1
#2       berit      august      1
#3 christopher       david      0
#4       david       erica      0
#5       erica       frank      0
#6       frank christopher      0

The + at the beginning of ifelse changes the logical values to integer value.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM